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Query: 192 VEAHPIDELLHjYWINAEAETDPSTOEi^EWFRKXEANDPEATELWQWFRDESLLEFN 251 

V+AHPIDELLKLYVRINAE3ySTDP+VDEEAREWFRKLE D EATELWQWFRDESLLEFN 
Sbjct: 181 VQAHPIDELLKLYVRINAEMTDPTVDEEMEWFRKLEDGDKEATELWQWFRDESIiLEFN 240 

5 Query: 252 RLYDQmmFDSYMGEAFVMDKMDEVLELLESKNLLVESKGAQVVNLEKYGIEHPALIKK 311 

RLYDQ++VTFDSYWGFAFYOT)IO©EVI.+I£.E+K^LVESKGAQVVNLEKYGIEHPALIKK 
Sbjct: 241 RLYDQLHVTFDSYNGEAFYNDKMDEAnjDLDFJUCNLLVESKGAQVVNLEKYGIEHPALIKK 300 

Query: 312 SDGATLYITRDLAAALYRKRTYDFAKSIYWGNEQSAHFKQLKAVLKEMDYDWSDDMTHV 371 
10 SDGATLYITFJDLAAALYRKRTYDFAKS+YWGNEQ+AHFKQLKA.VLKEM YDWSDDMTHV 

Sbjct: 301 SDGATLYITRDLAAALYRKRTYDFAKSVYWGNEQAAHFKQLKAVLKEMGYDWSDDMTHV 360 

Query: 372 PFGLVTKGGAKLSTRKGOTILLEPTVAEAINFAASQIEAKNPNLADKDKVAQAVGVGAIK 431 
FGLOTKGGAKLSTRKGNVILLEP7VAEAINRAASQIEAKWPNLADK+ VA AVGVGAIK 
15 Sbjct: 361 AFGLVTKGGAKLSTRKGNVILLEPTVAEAIiWAASQIEAKNPNIiADKEAVAHAVGVGAIK 420 

Query: 432 FYDLKTDRTNGYDFDLEAMVSFEGETGPYVQYAHARIQSILRKANFSPSNSDNYSLNDVE 491 

FYDLKTDR NGYDFDLEAMVS FEGETGPYVQYAHAR I QS I LRKA+F+ PS + YSL D E 
Sbjct: 421 FYDLKTDRMNGYDFDLEAMVSFEGETGPWQYAHARIQSILRKADFTPSATTTYSLADAE 480 

20 

Query: 492 SWEIIKLIQDFPRIIVRAADNFEPSIIA:<;FAINIAQCFNKYYAHTRILDEDAEISSRLAL 551 

SWEIIKLIQDFPRII R +DNFEPS I +AKFAINLAQ FNKYYAHTRILD+++E +RLAL 
Sbjct: 481 SWEIIKLIQDFPRIIKRTSDNFEPSIMAKFAINLAQSFNKYYAHTRILDDKSERDNRLAL 540 

25 Query: 552 CYATATVLKESLRLLGVDAPNEM 574 

CYATATVLKE+LRLLGVDAPNEM 
Sbjct: 541 CYATATVLKEALRLLGVDAPNEM 563 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 1076 

A DNA sequence (GBSxll50) was identified in S.agalactiae <SEQ ID 3315> which encodes the amino 
acid sequence <SEQ ID 3316>. This protein is predicted to be arginine hydroximate resistance protein 
(argR). Analysis of this protein sequence reveals the following: 

35 Possible site: 42 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3252 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10269> which encodes amino acid sequence <SEQ ID 
10270> was also identified. 

45 The protein has homology with the following sequences in the GENPEPT database. 



Query: 4 MNKIERQKRIKRLIQSGQIGTQEEIKLHLKNEGIDOTQATLSRDLREIGLLKLRSPEGKIi 63 

M K +R + IK++I ++ TQ+EI+ L+ + VTQ TLSRDLREIGL K++ + 
Sbjct: 1 MRKRDRHQLIKKMITEEKLSTQKEIQDRLEAHNVCVTQTTLSRDLREIGLTKVTCKN^ 60 

Query: 64 YYSLSTATSl^FSPALRSYILKVSRASFMLVI^TNLGFJ^VIANFIDEKGLPEILGTMAG 123 

Y ++ L ++ V+RA F LVL+T LGEASVIAN +D ILGT+AG 

Sbjct: 61 YVLVNETEKIDLVEFLSHHLEGVARAEFTLVLHTKLGFJ^VIANIVDVNKDEWILGTVAG 120 

Query: 124 ADTLLVICQNEDIAKVFEKEL 144 

AH-TLLVIC+++ +AK+ E L 
Sbjct: 121 ANTLLVICRDQHVAKLMEDRL 141 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 3317> which encodes the amino acid 
sequence <SEQ ID 3318>. Analysis of this protein sequence reveals the following: 

I- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3176 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 101/145 (69%) , Positives = 121/145 (82%) 

Query: 4 MNKIERQKRIKRLIQSGQIGTQEEIKLHLKNEGIDVTQATLSRDLREIGLLKLRSPEGKL 63 

MNK+ERQ+ + 1 KR+ 1 Q+ IGTQE+IK HL+ EGI VTQATLSRDLREIGLLKLR +GKL 
Sbjct: 1 MNKMERQQQI KRI I QAEHIGTQED I KNHLQKEGI WTQATLSRDLRK 1 GLLiKLRDEQGKi 60 

Query: 64 YYSLSTATSNRFSPALRSYILKVSRASFMLVIiNTNLGEASVIANFIDEKGLPEILGTMAG 123 

YYSLS + FSP +R Y+LKV RA FMLVL+TNLGEA VLAN ID + +ILGT+AG 
Sbjct: 61 YYSLSEPVATPFSPEWFYVLKTORAGFMLVLHTNLGEADVLANLIDNDAIEDILGTIAG 120 

Query: 124 ADTLLVICQNEDIAKVFEKELSVGL 148 

ADTLLVIC++E+IAK FEK+L+ GL 
Sbjct: 121 ADTLLVI CRDEE IAKRFEKDIAAGL 145 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1077 

A DNA sequence (GBSxll51) was identified in S.agalactiae <SEQ ID 3319> which encodes the amino 
acid sequence <SEQ ID 3320>. This protein is predicted to be DNA mismatch repair protein hexa (mutS). 
Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3570 (Affirmative) < suco 

bacterial membrane Certainty=D . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA88597 GB:M18729 mismatch repair protein [Streptococcus pneumoniae] 
Identities = 593/858 (69%), Positives = 698/858 (81%), Gaps = 14/858 (1%) 

Query: 1 MAKPTISPGMQQYLDIKENYPDAFLLFRMGDFYELFYDDAVKAAQILEISLTSRNKNAEK 60 

MA +SPGMQQY+DIK+ YPDAFLLFRMGDFYELFY+DAV AAQILEISLTSRNKNA+ 
Sbjct: 1 MAIEKLSPGMQQYVDIKKQYPDAFLLFRKGDFYELFYEDAVNAAQILEISLTSRNKNADN 60 

Query: 61 PIPMAGVPYHSAQQYIDVLVELGYK\fflIAEQMEDPKKAVGVVlffi^ 120 

PIPMAGVPYHSAQQYIDVL+E GYKVAIAEQMEDPK+AVGWKREWQV+TPGTW+S+K 
Sbjct: 61 PIPMAGVPYHSAQQYIDVLIEQGYKVAIAEQNEDPKQAVGWKREWQVITPGTVVDSSK 120 

Query: 121 PDSANNFLVAIDSQDQQTFGLAYMDVSTGEFQATLLTDFESVRSEILNLKAREIWGYQL 180 

PDS NNFLV+ID + Q FGLAYMD+ TG+F T L DF V EI NLKARE+V+GY L 
Sbjct: 121 PDSQNNFLVSIDREGNQ-FGIAYMDLVTGDFYVTGLLDFTLVCGEIRNLKAREVVLGYDL 179 



Query: 181 TDEKNHLLTKQMNLLLS YEDERLND IHL IDEQLTDLEI SAAEKLLQYVHRTQKREIiSHLQ 240 
++E+ +L++QMNL+LSYE E D+HL+D +L +E +A+ KLLQYVHRTQ REL+HL+ 
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Sbjct: 180 SEEEEQILSRQMNLVLSYEKESFEDLHLLDLRIATVEQTASSKLLQYVHRTQMREIJSHLK 239 

Query: 241 KWHYEIKDYLQMSYATKNSLDLLENflRTSKKHGSLYWLLDETKTAMGTRMIjRTWIDRPIj 300 

V+ YEIKD+LQM YATK SLDL+ENAR+ KK GSL+WLLDETKTAMG R+LR4WI RPL 
Sbjct: 240 PVIRYEIKDFLQMDYATKASLDLVENftRSGKKQGSLFWLLDETKTAMGMRLLRSWIHRPL 299 

Query: 301 VSMNRIKERQDIIQVFLDYFFERNDLTESLKGVYDIERLASRVSFGKANPKDLLQIiGQTL 360 

+ RI +RQ+++QVFLD+FFER+DLT+SLKGVYDIERLASRVSFGK NPKDLLQL TL 
Sbjct: 300 IDKERIVQRQEWQVFLDHFFERSDLTDSLKGVYDIERIASRVSFGKTNPKDLLQIATTL 359 

Query: 361 SQIPRIKMILQSFNQPELDIIVNKIDTMPELESLINTAIAPEAQATITEGNIIKSGFDKQ 420 

S +PRI+ IL+ QP L ++ ++D +PELESLI+ AIAPEA IT+G II++GFD+ 
Sbjct: 360 SSVPRIRAILEGMEQPTLAYLIAQLDAIPELESLISAAIAPEAPHVITDGGIIRTGFDET 419 

Query: 421 LDNYRTVMREGTGWIADIEAKERAASGIGTliKIDYNKKDGYYFHVTNSNLSLVPEHFFRK 480 

LD YR V+REGT WIA4 IEAKER SGI TLKIDYNKKDGYYFHVTNS L VP HFFRK 
Sbjct: 420 LDKYR(^REGTSWIAEIFAKERENSGISTLKIDYNKKDGYYFHVTNSQLGNVPAHFFRK 479 

Query: 481 ATLKNSERYGTAELAKIEGEMLEAREQSSNLEYDIFMRVRAQVESYIKRLQELAKTIATV 540 

ATLKNSER+GT ELA+IEG+MLEARE+S+NLEY+IFMR+R +V YI+RLQ LA+ IATV 
Sbjct: 480 ATLKNSERFGTEE1ARIEGDMLEAREKSANLEYEIFMRIREEVGKYIQRLQALAQGIATV 539 

Query: 541 DVLQSIAWAENYHYWPKFMDQHQIKIKNGRHATVEKVMGVQEYIPNSIYFDSQTDIQL 600 

DVLQSLAWAE H +RP+F D Ql 1+ GRHA VEKVMG Q YIPN+I T IQL 

Sbjct: 540 DVLQSIiAWAETQHLIRPEFGDDSQIDIRKGRHAVVEKVMGAQTYIPNTIQMAEDTSIQL 599 

Query: 601 ITGPNMSGKSTYMRQLALTVIMAQKGGFVSADEVDLPVFDAIFTRIGAADDLISGQSTFM 660 

+TGPNMSGKSTYMRQLA+T +MAQ+G +V A+ LP+FDAIFTRIGAADDL+SGQSTFM 
Sbjct: 600 VTGPNMSGKSTYMRQLAMTAVMAQLGSYVPAESAHLPIFDAIFTRIGAADDLVSGQSTFM 659 

Query: 661 VEMMEANQAVKRASDKSLILFDELGRGTATYDGMALAQSIIEYIHDRVRAKTMFATHYHE 720 

VEMMEAN A+ A+ SLILFDELGRGTATYDGMALAQSIIEYIH+ + AKT+ FATHYHE 
Sbjct: 660 VEMMEANNAISHATKNSLILFDELGRGTATYDGMALAQSIIEYIHEHIGAKTLFATHYHE 719 

Query: 721 LTDLSEQLTRLVNVHVATLERDGEVTFLHKIESGPADKSYGIUVAKIAGLPIDLLDRATD 780 

LT h L LVNVHVATLE+DG+VTFLHKIE GPADKSYGIHVAKIAGLP DLL RA 
Sbjct: 720 LTSLESSLQHLVNVHVATLEQDGQVTFLHKIEPGPADKSYGIHVAKIAGLPADLLARADK 779 

Query: 781 ILSQLEADAVQLIVSPSQEAVTADLNEELDSEKQCGQLSLFEEPSNAGRVIEELEAIDIM 840 

IL+QLE + SP T+ + E Q+SLF+ + ++ EL +D+ 

Sbjct: 780 ILTQLENQGTE S PPPMRQTSAVTE QISLFDR-AEEHPILAELAKLDVY 826 

Query: 841 NLTPMQAMNAI FDLKKLL 858 

N+TPMQ MN + +LK+ L 
Sbjct: 827 NMTPMQVMNVLVELKQKL 844 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3321> which encodes the amino acid 
sequence <SEQ ID 3322>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.38 Transmembrane 532 - 548 ( 532 - 549) 

Final Results 

bacterial membrane Certainty=0 . 1553 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 661/858 (77%), Positives = 746/858 (85%), Gaps = 7/858 (0%) 

Query: 1 MAKPTISPGMQQYLDIKENYPDAFIiFRMGDFYELFYDDAVKAAQILEISLTSRNKNAEK 60 

MAK ISPGMQQYLDIK++YPDAFLLFRMGDFYELFY+DAVKAAQ+LEI LTSRNKNAE 
Sbjct: 1 MAKTNIS PGMQQYLD I KKDYPDAFIiLFRMGDFYELFYEDAVKAAQLLEIGLTSRNtCNAEN 60 

Query: 61 PIPMAGVPYHSAQQYIDVLWLGYKVAIAEQMEDPKKAVGWKREVVQVvTPGTVVESTK 120 
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PIPMAGVP+HSAQQYIDVL+ELGYKVA+AEQMEDPK+AVGVVKREWQV+TPGTW-t-S K 
Sbjct: 61 PIPMAGVPHHSAQQYIDVLIELGYKVAVAEQMEDPKQAVGWKREWQVITPGTWDSAK 120 

Query: 121 PDSANNFLVAIDSQDQQTFGIAYMDVSTGEFQATLLTDFESVRSE I LNLKARE I WGYQL 180 

PDSANNFLVA+D D +GLAYMDVSTGEF T L DF SVRSEI NLKA+E+++G+ L 
Sbjct: 121 PDSMTOFLVAVDF-DGCRYGIAYMDVSTGEPCVTDLADFTSVRSEIQNLKAKEVLLGFDL 179 

Query: 181 TDEKNHLLTKQMNLLLSYEDERLNDIHLIDEQLTDLEISAAEKLLQYVHRTQKRELSHLQ 240 

++E+ +L KQMNLLLSYE+ D LID QLT +E++AA KLLQYVH+TQ RELSHLQ 
Sbjct: 180 SEEEQTILVKQMNLLLSYEETVYEDKSLIDGQLTTOELTAAGKLLQYVHKTQMRELSHLQ 239 

Query: 241 KWHYEIKDYLQMSYATKNSLDLLENARTSKKHGSLYWLLDETKTAMGTRMLRTWIDRPL 300 

+VHYEIKDYLQMSYATK+SLDL+EMART-i-KICHGSLYWLLDETKTAMG R+LR+WIDRPL 
Sbjct: 240 ALVHYEIKDYLQMSYATKSSLDLVENARTNKKHGSLYWLLDETKTAMGMRLLRSWIDRPL 299 

Query: 301 VSMNRI KERQD 1 1 QVFLDYFFERNDLTESLKGVYDIERLASRVS FGKANPKDLLQLGQTL 360 

VS I ERQ+IIQVFL+ F ER DL+ SLKGVYDIERL+ SRVSFGKANPKDLLQLG TL 
Sbjct: 300 VSKEAILERQEIIQVFLNAFIERTDLSNSLKGVYDIERLSSRVSFGKANPKDLLQLGHTL 359 

Query: 361 SQIPRIKMILQSFNQPELDIIVNKIDTMPELESL1NTAIAPEAQATITEGN1IKSGFDKQ 420 
+Q+P IK IL+SF+ P +D +VN ID++PELE LI TAI P+A ATI+EG+II++GFD++ 
: 360 AQVPYIKAILESFDSPCVDKLVNDIDSLPELEYLIRTAIDPDAPATISEGSIIRNGFDER 419 



Sbjct 
Query: 
Sbjct 
Query: 
Sbjct 
Query: 
Sbjct: 

Sbjct 

Sbjct 



421 LDNYRTVMREGTGWIADIEAKERAASGIGTLKIDYNKKDGYYFHVTNSNLSLVPEHFFRK 480 

LD+YR VMREGTGWIADIEAKER ASGI LKIDYNKKDGYYFHVTNSHLSLVPEHFFRK 
420 LDHYRKVMREGTGWIADIEAKERQASGIWNLKIDYNKKDGYYFHVTNSNLSLVPEHFFRK 479 



541 DVLQSLAWAENYHYVRPKFNDQHQIKIKNGRHATVEKVMGVQEYIPNSIYFDSQTDIQL 600 

DVLQSLAWAE HY+RP+FND H I 1+ GRHA VEKVMGVQEYIPNSI FD QT IQL 
540 DVLQSLAVVAETNHYIRPQFIsroNHVITIQEGRHAWEKVMGVQEYIPNSISFDQQTSIQL 599 

601 ITGPNMSGKSTYMRQLALTVIMAQMGGFVSADEVDLPVFDAIFTRIGAADDLISGQSTFM 660 

ITGPNMSGKSTYMRQLALTVIMAQMG FV+AD VDLP+FDAIFTRIGAADDLISGQSTFM 
600 ITGPNMSGKSTYMRQLALTVIMAQMGS FVAADHVDLPLFDAI FTRIGAADDL I SGQSTFM 659 

661 VEMMEANQAVKRASDKSLILFDELGRGTATYDGMALAQSIIEYIHDRVRAKTMFATHYHE 720 

VEMMEANQA+KRASD SLILFDELGRGTATYDC3MALAQ+IIEYIHDRV AKT+ FATHYHE 
660 VEtlMEANQAIKRASDNSLILFDELGRGTATYDGMALAQAIIEYIHDRVGAKTIFATHYHE 719 

Query: 721 LTDLSEQLTRLYNVHVATLERDGEVTFLHKIESGPADKSYGIHVAKIAGLPIDLLDRATD 780 

LTDLS LT LWVHVATLE+DG+VTFLHKI GPADKSYGIHVAKIAGLP LL RA + 
Sbjct: 720 LTDLSTNLTSLWVHVATLEKDGDVTFLHKIAEGPADKSYGIHVAKIAGLPKSLLKRADE 779 

Query: 781 ILSQLEADAVQLIVSPSQEAVTADLNEELDSEKQQGQLSLFEEPSNAGRVIEELEAIDIM 840 

+L++LE S S E ++ E S +QGQLSLF + A + + LE ID+M 

Sbjct: 780 VLTRLETQ SRSTEIISVPSQVESSSAVRQGQLSLFGDEEKAHEIRQALEVIDVM 833 

Query: 841 NLTPMQAMRAI FDLKKLL 858 

N+TP+QAM +++LKKLL 
Sbjct: 834 NMTPLQAMTTLYELKKLL 851 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1078 

A DNA sequence (GBSxll52) was identified in S.agalactiae <SEQ ID 3323> which encodes the amino 
acid sequence <SEQ ID 3324>. This protein is predicted to be cold shock protein-related protein. Analysis 
of this protein sequence reveals the following: 

I- terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 2095 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

5 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB69404 GB:A91080 unnamed protein product [unidentified] 
Identities = 48/63 (76%) , Positives = 56/63 (88%) 

10 

Query: 1 MTQGTVTSWFNSEKGFGFISSETGTDVFAHFSSIKVEGFKTLEHGQKVTFDIQDGQRGPQA 60 

MT+GTVKWFN +KGFGFI+SE G DVFAHFS+I+ GFKTL+EGQKVTFD++ GQRGPQA 
Sbjct: 1 MTKGTVTCWFNPDKGFGFITSEDGQDVFAHFSQIQTSGFKTLDEGQKVTFDVKAGQRGPQA 60 

15 Query: 61 TNI 63 

NI 

Sbjct: 61 VNI 63 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3325> which encodes the amino acid 
20 sequence <SEQ ID 3326>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm --- Certainty=0 . 2350 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 Identities = 49/63 (77%), Positives = 56/63 (88%) 

Query: 1 MTQGTVKWFNSEKGFGFISSETGTDVFAHFSEIKVDGFKTLEEGQKVTFDIQDGQRGPQA 60 

M QGTVKWFN+EKGFGFIS+E G DVFAHFS 1+ +GFKTLEEGQKV FD+++GQRGPQA 
Sbjct: 3 MAQGTVKWFNAEKGFGFISTENGQDVFAHFSAIQTNGFKTLEEGQKVAFDVEEGQRGPQA 62 

35 

Query: 61 TNI 63 
NI 

Sbjct: 63 VNI 65 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1079 

A DNA sequence (GBSxll53) was identified in S.agalactiae <SEQ ID 3327> which encodes the amino 
acid sequence <SEQ ID 3328>. Analysis of this protein sequence reveals the following: 

45 Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- O=rtainty=0. 6378 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1080 

A DNA sequence (GBSxll54) was identified in S.agalactiae <SEQ ID 3329> which encodes the amino 
5 acid sequence <SEQ ID 3330>. This protein is predicted to be DNA mismatch repair protein hexb (mutL). 
Analysis of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 .2242 {Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 10267> which encodes amino acid sequence <SEQ ID 
10268> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA88600 GB:M29686 mismatch repair protein [Streptococcus pneumoniae] 
Identities = 452/657 (68%) , Positives = 543/657 (81%) , Gaps = 8/657 (1%) 

20 

Query: 20 LSKIIELPDILANQIAAGEVVERPSSWKELvENAIDAGSSQITIEVEESGLKKIQITDN 79 

+S IIELP++LANQIAAGEV+ERP+SV KELVENAIDAGSSQI IE+EE+GLKK+QITOM 
Sbjct: 1 MSHIIELPEMLANQIAAGEVIERPASVCKELVENAIDAGSSQIIIEIEEAGLKKVQITDN 60 

25 Query: 80 GEGMTSEDAVLSLRRHATSKIKSQSDLFRIRTLGFRGEALPSIASISLMTIKTATEQGKQ 139 

G G+ ++ L+LRRHATSKIK+Q+DLFRIRTLGFRGEALPSIAS+S++T+ TA + 
Sbjct: 61 GHGIAHDEVELALRRHATSKIKNQADLFRIRTI^FRGEALPSIASVSVLTLLTAvDGASH 120 

Query: 140 GTLLVAKGGNIEKQEWSSPRGTKILVENDFFNTPARLKYMKSLQSELAHIIDIVNRLSL 199 
30 GT LVA+GG +E+ +SP GTK+ VE4-LFFNTPARLKYMKS Q+EL+HIIDIVNRL L 

Sbjct: 121 GTKLVARGGEVEEVIPATSPVGTKVCVEDLFFNTPARLKYMKSQQAELSHIIDIVNRLGL 180 



Query: 200 AHPEVAFTLINDGKEMTKTSGTGDLRQAIAGIYGLNTAKKMIEISNADLDFEISGYVSLP 259 

AHPE++F+L1+DGKEMT+T+GTG LRQAIAGIYGL +AKKMIEI N+DLDFEISG+VSLP 
Sbjct: 181 AHPEISFSLISDGKEMTRTAGTGQLRQAIAGIYGLVSAKKMIE1ENSDLDFEISGFVSLP 240 

Query: 260 ELTRANRNYITLLINGRYIKNFLIjNRSILDGYGSKLMVGRFPIAVIDIQIDPYLADWVH 319 

ELTRANRNYI+L INGRYIKNFLLNR+ILDG+GSKLMVGRFP+AVI I IDPYLADVNVH 
Sbjct: 241 ELTRANRlWISLFINGRYIKNFLLNRaiLTOFGS^rVGRFPLAVIHIHIDPYI^VWH 300 

Query: 320 PTKQEVRISKERELMSLISTA1SESLKQYDLIPDALENLAKTSTRSVDKPIQTSFSLKQP 379 

PTKQEVRISKE+ELM+L+S A1+ SLK+ LIPDALENLAK++ R+ +K QT LK+ 
Sbjct: 301 PTKQETOISKEKELMTLVSEAlANSLKEQTLIPDALENlAKSTVRNRE?CVEQriLPLKEN 360 

Query: 380 GLYYDFAKNDFFIGADTVSEPIA1<IFTNLDKSEGSVDNDVKNSVNQGATQSPNIKYASRDQ 439 

BYY++ + + 4E L ■+ K ++++ T+ + +A R 

Sbjct: 361 TLYYEKTEP SRPSQTEVADYQVELTDEGQDLTLFAKETIJ3R-LTKPAKLHFAERKP 415 

Query: 440 ADSENF1HSQDYLSSKQSLNKLVEKLDSEESSTFPELEFFGQMHGTYLFAQGNGGLYIID 499 

A+ + H + L+ S+4K +KL+ EE+S+FPELEFFGQMHGTYLFAQG GLYIID 
Sbjct: 416 ANYDQLDHPELDLA---SIDKAYDKLEREEASSFPELEFFGQMHGTYLFAQGRDGLYIID 472 

Query: 500 QHAAQERVKYEYYREKIGEVDNSLQQLLVPFLFEFSSSDFLQLQEKMSLLQDVGIFLEPY 559 

QHAAQERVKYE YRE IG VD S QQLLVP++FEF + D L+L+E+M LL++VG+FL Y 
Sbjct: 473 QHAAQERVKYEEYRESIGNVDQSQQQLLVEYIFEFPADDALRLKERMPLLEEVGVFLAEY 532 

Query: 560 GNNTFILREHPIWMKEEEVESGIYSMCDMLLLTlffiVSVKCYRAELAIMMSCKRSIKANHT 619 

G N FILREHPIWM EEE+ESGIYEMCDMLLLT EVS+KKYRAELAIMMSCKRSIKANH 
Sbjct: 533 GENQFILREHPIWMAEEEIESGIYEMCBMLLLTKEVSIK^ 592 
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Query: 620 LDDYSARHLLDQIAQCKNPYNCPHGRP'^LVNFTI^ADMEKIIFKRIQENHTSLRDLGKY 676 

+DD+SAR LL QL+QC NPYNCPHGRPVLV+FTK+DMEMF+RIQENHTSLR+LGKY 
Sbjct: 593 IDDHSARQLLYQLSQCDNp-rNCPHGRPVLVHFTKSDMEKHFRRIQENHTSLRELGKY 649 

A related DNA sequence was identified in S.pyogenes <SEQ ID 333 1> which encodes the amino acid 
sequence <SEQ ID 3332>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=0 . 1854 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 00D0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 502/663 (75%) , Positives = 574/663 (85%) , Gaps = 9/663 (1%) 

Query: 20 LSKIIELPDILANQIAAGEWERPSSWKELVENAIDAGSSQITIEVEESGLKKIQITDN 79 

++ IIELP++LANQIAAGEWERP+SWKELVEHAIDA SSQIT+E+EESGLK IQ+TDN 
Sbjct: 14 MTNIIELPEVLANQIAAGEVVERPASVVKELVENAIDAKSSQITVEIEESGLKMIQVTDN 73 

Query: 80 GEGMTSEDAVLSLRRHATSKIKSQSDLFRIRTLGFRGEALPSIASISLMTIKTATEQGKQ 139 

GEGM+ ED LSLRRHATSKIKSQSDLFRIRTLGFRGEALPS+ASIS +TIKTAT+4 
Sbjct: 74 GEGMSHEDLPLSLRRHATSKIKSQSDLFRIRTLGFRGEALPSVASISKITIKTATKEVTH 133 

Query: 140 GTLLVAKGGNIEKQEWSSPRGTKILVENLFFIfTPARLKYMKSLQSELAHIIDIVNRLSL 199 

G+LL+A GG IE E +S+P GTKI VENLF+NTPARIiKYMKSLQ+ELAH I +D+VNRLSL 
Sbjct: 134 GSLLIATGGEIETLEMSTPTGTKIKVEOTFMWPA^KXMKSLQAELAHIVDvVNRLSL 193 

Query: 200 AHPEVAFTLINDGKEMTKTSGTGDLRQAIAGIYGLNTAKKMIEISNADLDFEISGYVSLP 259 

AHPEVAFTLI+DG+++T+TSGTGDLRQAIAGIYGLNT KKM+ ISNADLDFE+SGYVSLP 
Sbjct: 194 AHPEVAFTLISDGRQLTQTSGTGDLRQAIAGIYGIOTTKKMLAISNADLDFEVSGYVSLP 253 

Query: 260 ELTRANRNYITLLINGRYIKNFLIJTOSILDGYGSKLMVGRFPIAVIDIQIDPYLADVNVH 319 

ELTRANRNY+T+L+NGRYIKNFLIiNR+ILDGYGSKLMVGRFPI VIDIQIDPYLADVNVH 
Sbjct: 254 ELTRANRISTYMTILWGRYI KNFLIiNRAILDGYGSKLMVGRFPIWIDIQIDPYIiAD Vl^TTO 313 

Query: 320 PTKQEVRISKERELMSLISTAISESLKQYDLIPDALENIAKTSTRSVDKPIQTSFSLKQP 379 

PTKQEVRISKERELM+LISTAISESLK+ DLIPDALENLAK+STR KP QT L+ 
Sbjct: 314 PTKQEWISKERELMALISTAISESDKEQDLIPDALENLAKSSTRHFSKPEQTQLPLQSR 373 

Query: 380 GLYYDRAKND FF I GADTVS E P I AN FTNLDKSDGS VDNDVKNSV NQGATQSPNIK 433 

GLYYD KNDFF+ VSE I D G+VDN VK ++ ++K 

Sbjct: 374 GLYYDPQKNDFFVKESAVSEKI PETDFYSGAVDNSVKVEKVELLPHSEEVIGPSSVK 430 

Query: 434 YASRDQADSENFIHSQDYLSSKQSLNKLVEKLDSEESSTFPELEFFGQMHGTYLFAQGNG 493 

+ASR Q H L ++Q L++++ +L++E S FPEL++FGQMHGTYLFAQG 

Sbjct: 431 HRSRPQNTFTETDHPNLDLKNRQKLSQMLTRLENEGQSVFPELDYFGQMHGTYLFAQGKD 490 

Query: 494 GLYIIDQHAAQERVKYEYYREKIGEVDNSLQQLliVPFLFEFSSSDFLQLQEKMSLLQDVG 553 

GL+IIDQHAAQERVKYEYYR+KIGEVD+SLQQLLVP+LFEFS SDF+ LQEKM+LL +VG 
Sbjct: 491 GLFIIDQHAAQERVKYEYYRDKIGEVDSSLQQLLVPYLFEFSGSDFINLQEKMALIiNEVG 550 

Query: 554 IFLEPYGNNTFILREHPIVMKEEEVESGIYEMCDMLLLTNEVSVKKYRAELAIMMSCKRS 613 

IFLE YG+NTFILREHPIWMKEEE+ SG+YEMCDMLLLTNEVS+K YRAELAIMMSCKRS 
Sbjct: 551 IFLEVYGHOTFirjREHPIWMKEEEIASGVYEMCDMLLLTOEVSIKTYRAELAIMMSCKRS 610 

Query: 614 IKANHTLDDYSARHLLDQ^QCKNPYNCPHGRPVLWTKAD^KMFKRIQENHTSLRDLGKY 676 

IKANH+LDDYSAR+LL QLAQC+NPYNCPHGRPVL+NF+KADMEKMF+RIQENHTSLR+LGKY 
Sbjct: 611 IKAiraSLDDYSARNLLLQIAQCQNPYNCPHGRPVLINFSKADMEKMFRRIQENHTSLRELGKY 673 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1081 

A DNA sequence (GBSxll55) was identified in S.agalactiae <SEQ ID 3333> which 
acid sequence <SEQ ID 3334>. Analysis of this protein sequence reveals the following: 



N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 .3372 (Affirmative) . 

bacterial membrane Certainty=0 . 0000 (Not Clear) < i 

bacterial outside Certainty=0 . 0000 (Not Clear) < s 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

s analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or 



INTEGRAL 



INTEGRAL 



Example 1082 

A DNA sequence (GBSxll56) was identified in S.agalactiae <SEQ ID 3335> which encodes the amino 
acid sequence <SEQ ID 3336>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = 
Likelihood « 
Likelihood = 
Likelihood = 
Likelihood ^ 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



390 - 4G6 
271 - 287 



303 - 319 



Transmembrane 



22S - 242 



Transmembrane 111 - 



387 - 412 



302 - 320) 



325 - 342 
226 - 242 



111 - 127 



■ Final Results 

bacterial membrane - 
bacterial outside - 



-- Certainty=0 . 6504 (Affirmative) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 



bacterial cytoplasm Certainty=0 . 0000 (Not Clear) - 



A related GBS nucleic acid sequence <SEQ ID 10265> which encodes amino acid sequence <SEQ ID 
40 1 0266> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

^ protein [Lactococcus 

, Gaps = 4/401 (0%) 

Sbjct: 



>GP:CAA61918 GB:X89779 LmrP integral n 
lactis] 

Identities = 145/401 (36%) , Positives = 236/401 (58%) , 



VKEFFALPKQLQLRELLRFISITVGSAIFPFMM.IYYVQYFGNLVTGILIIITQLSGFVAT 68 
+KEF+ L K LQLR + F+ +F M +YY QY G+ +TGIL+ ++ ++ FVA 

MKEFWNLDKNLQLRLGI VFLGAFSYGTVFSSMTI YYNQYLGSAITGILLALSAVATFVAG 6 0 



Query: 69 LYGGEJLSnAMGRKKVVIIGSLLATIGWAITIAANVPNHITPHLTFVGILIIEIAHQFYFP 128 

+ G +D GRK V++ G+++ +G A+ IA+N+P H+ P TF+ L+I + F 
Sbjct: 61 IIAGFFADRNGRKPVMVFGTIIQLLGAALAIASNLPGff/NPWSTFIAFLLISFGYNFVIT 120 



Query: 129 AYEAMTIDLTNEQNRRFVYTIGYWLVNIAVMLGSGIAGIFYDHHFFELLIVLLIISAICC 188 
55 A AM ID +N +NR+ V+ + YW N++V+LG+ + + F LL++LL+ + 
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Query: 


189 


Sbjct: 


181 


Query: 


248 


Sbj ct: 


238 




308 


Sbjct: 


298 




368 


Sbj ct : 


358 



T K D+ F+ Y VL DK ++++ I ++ + +Q DN+ 

--TVKVDEKfiENIFQAYKTVLQDKTYMIFMGANIATTFIIMQFDNP 237 



V+L +F+ ++ G I G +ML++ 4 



+L+VLLMTT+N+ ++W ++ I C 



TF I IA +T GE++Y P+ Q h A++M KIGSY+G AI 



P+AS+LAG +VS+S 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3337> which encodes the amino acid 
20 sequence <SEQ ID 3338>. Analysis of this protein sequence reveals the following: 



INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 166 - 



266 - 282 
295 - 311 



Transmembrane 98 - 
Transmembrane 355 - 
Transmembrane 218 - 
Transmembrane 315 - 
75 - 



Transmembrane 144 - 



161 - 188) 
376 - 4031 
261 - 285! 
291 - 313' 



355 - 374; 



315 - 3311 



■-- Certainty=0. 5564 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

integral membrane protein [Lactococcus 

= 223/400 (55%) , Gaps = 2/400 (0%) 



>GP:CAA61918 GB:X89779 I 
lactis] 

Identities - 138/400 (34%) , Positives = 



MQEFLNLPKQIQLRQLWFWITLGSSIFPF^ff^IYYTTYFGTFWIGLLNMITSLMGFVGT 60 
M+EF NL K +QLR + F+ +H-F M +YY Y G+ TG+L+ ++++■ FV 

Sbjct: 1 MKEFWNLDKNLQLRLGIVFLGAFSYGTVFSSMTIYYNQYLGSAITGILLALSAVATFVAG 60 

Query: 61 LYGGHLSDALGRKKVIMIGSVGTTLGWFL.TILANLPNAAIPVJLTFAGILLVEIASSFYGP 120 

+ G +D GRK V+-+ G++ LS II +NLP PW TF LL+ +F 
Sbjct: 61 ILAGFFADRNGRKPVMVFGTIIQLLGAALAIASNLPGHVNPWSTFIAFLLISFGYNFVIT 120, 

Query. 121 AYFA^IDLTDESNP^FWTINYflFINIAWFGAGLSGLFYDHHFIALLVALLLW^CF 180 

A AM+ID ++ NR+ V+ ++YW N++V+ GA L + F ALLV LLL ++ F 
Sbjct: 121 AGNAMIIIffiSNAENRKWFMLDYWAQNLSVILGAALGAWLFRPAFEALLVILLLTVIjVSF 180 

Query: 181 GVAYYCFDETRPETHAFDHGKGLLASFQNYRQVFHDRAFVLFTLGAIFSGSIWMQMDNYV 240 

+ + ET T D + FQ Y+ V D+ +++F I + I HQ DN++ 

Sbjct: 181 FLTTFVMTETFKPTVKVDEKAENI - - FQAYKTVLQDKTYMI FMGANIATTFI IMQFDNFL 238 

Query: 241 PVHLKLYFQPTAVLGFQVTSSKMLSLMVI1TNTI1LIVLFMTVTO 300 

PVHL F+ GF++ +ML++ ++ +L+VL MT +N+LT+ W + GSL 

Sbjct: 239 PVHLSNSFKTITFWGFEIYGQRMLTIYLILACVLWLLMTTLNRLTKDWSHQKGFIWGSL 298 

Query: 301 LFTLGMLLSFTFTQFYAIWLSWLLTFGEMIN\'SASQVLRADMMDHSQIGSYTGFVSMAQ 360 
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Query: 361 PLGMLASLLVSVSHFTGPLGVQCLFAVIALLGIYFTWS 400 

P+ +ILA LLVS+S +GV + A+ +L I +V+ 

Sbjct: 359 PIASILAGLLVSISPMIKAIGVSLVIiALTF^LAIILVLVA 398 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 228/406 (56%) , Positives = 305/406 (74%) 

Query: 9 VKEFFALPKQLQLRELLRFISITVGSAIFPPWflMYYVQYFGNLVTGILIIITQLSGFVAT 68 

++EF LPKQ+QLR+L+RF++IT+GS+IFPFMAMYY YFG TG+L++IT L GFV T 
Sbjct: 1 MQEFLNLPKQIQLRQLVRFVTITLGSS I FPFMAMYYTTYFGTFWTGLLMMITSLMGFVGT 60 

Query: 69 LYGGHLSDAMGRKKWI IGSLLATIGWAITIAftNVPNHITPHLTFVGILI IEIAHQFYFP 128 

LYGGHLSDA+GRKKV++IGS+ T+GW +TI AN+PN P LTF GIL++EIA FY P 
Sbjct: 61 LYGGHLSDALGRKKVIMIGSVGTTLGWFLTILflKLPNAAIPWLTFAGILLVEIASSFYGP 120 

Query: 129 AYFAMTIDLTNEQNRRFVYTIGYWLVNIAVMLGSGIAGIFYDHHFFELLIVLLIISAICC 188 

AYEAM IDLT+E NRRFVYTI YW +NIAVM G+G++G+FYDHHF LL+ LL+++ +C 
Sbjct: 121 AYF^LIDLTDESmRFWTINYWFINIAVMFGAGLSGLFYDHHFLALLVALLLVNVLCF 180 

Query: 189 FVVYFKFDETKPQEGTFKHDKGVLGTFKNYSQVIVDKAFVVYTLGAIGSSVVWLQVDNYF 248 

V Y+ FDET+P+ F H KG+L +F+NY QV D+AFV+ +TLGAI S +W+Q+DNY 
Sbjct: 181 GVAYYCFDETRPETHAFDHGKGLLASFQNYRQVFHDRAFVLFTLGAIFSGSIWMQMDNYV 240 

Query: 249 SVl^KQNFEWSILGHTITGAKMLSIAVFTim.LlVLLMTTINKFIEimPLKRQLILGSL 308 

V+LK F+ ++LG +T +KMLSL V TNTLLIVL MT +NK E W L QL++GSL 
Sbjct: 241 PVHLKLYFQPTAVLGFQVTSSKMLSLMVLTOTLLIVLFMTVVNKLTEKWKLLPQLWGSL 300 

Query: 309 ICGFG^HjFNISLNTFGAILIAMTFFTFGEMIYVPASQVIjRAEMMVEGKIGSYSGFLAIAQ 368 

+ GML + + FAI +++ TFGEMI V ASQVLRA+MM +IGSY+GF+++AQ 
Sbjct: 301 LFTLGMLLSFTFTQFYAIWLSWLLTFGEMINVSASQVLRADMMDHSQIGSYTGFVSMAQ 360 

Query: 369 PWASVLAGAMVSLSYFTGKIGVQITLTIFMLAGLvLILYATKMKNI 414 

P+ ++LA +VS+S+FTG +GVQ + L G+ + + KMK + 
Sbjct: 361 PLGAILASLLVSVSHFTGPLGVQCLFAVIALLC-IYFTWSAKMKKV 406 



A related GBS gene <SEQ ID 8725> and protein <SEQ ID 8726> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
SRCFLG: 0 

McG: Length of UR: 4 

Peak Value of UR: 1.73 

Net Charge of CR: 1 
McG: Discrim Score: -4.26 
GvH: Signal Score (-7.5): -2.48 

Possible site: 35 
>» Seems to have no N-terminal signal 
Amino Acid Composition: calculated from 1 



ALOM program 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



count: 12 value: 
Likelihood =-14 
Likelihood = -8. 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



01 



16B - 184 



263 - 279 



Transmembrane 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
239 



294 - 312 
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icml HYP ID : 7 CFP: 0.660 



- Final Results 

bacterial membrane Certainty=0 . 6604 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



10 The protein has homology with the following sequences in the databases: 

ORF01675(325 - 1530 of 1854) 

EGAD|40187]42348(1 - 400 of 408) integral membrane protein (lmrP) {Lactococcus lactis} 
GP | 1052754 jemb | CAA61918. 1 | |X89779 LmrP integral membrane protein {Lactococcus lactis} 
PIR|S5813l|S58131 integral membrane protein LmrP - Lactococcus lactis 
15 %Match =21.7 

%Identity = 36.2 %Similarity =60.8 

Matches = 145 Mismatches = 155 Conservative Sub.s = 99 



LQKLIWRKCIjNESKKIIQASGI*ENIDNYLLGKKGEK^KEFFALPKQLQLRELLRFISITVGSAIFPFM7AMYYVQYFG1IL 
:|||: I I IMI : h =11 =11 11=1= 

MKEFWNLDKNLQLRLGIVFLGAFSYGTVFSSMTIYYNQYLGSA 



25 483 513 543 573 603 633 663 693 

VTGILIIITQLSGFVATLYGGHLSDAMGRKKWIIGSLLATIGKA1TIAANVPNHITPHLTFVGILIIEIAHQFYFPAYE 
=1111= = = = = III : I = = l III |:: 1=== =1 |: 11=1=1 1= I 11= 1=1 =1 I 
ITGILimiSAVATFVAGILAGFFADRNGRKPVWFGTIIQLLGAAIAIASNLPGHVNPVISTFIAFLLISFGYNFVITAGN 
60 70 80 90 100 110 120 

30 

723 753 783 813 843 873 900 930 

AMTIDLlTOQNRRFVYTIGYWLTOIAVMLGSGIAGIFYDHHFFELLIVIillSAICCFvVYFKFDET-KPQEGTFKHDKG 

II || =| =11= 1= = II l=:|:||: = :: I 11-11= =1=1 II II 111= 

A^IIIDASNAENRKVVFMLDYWAQNLSVILGA?lLGAWLFRPAFEALLVILLL^VLVSFFLTTFVMTETFKP TVKVDEK 

35 140 150 160 170 180 190 200 

960 990 1020 1050 1080 1110 1140 1170 

VLGTFKNYSQVLVDKAFVVYTLGAIGSSVWLQVDire 

1= I II II ==== I == = =1 ||:: 1=1 =|: -III =ll== = =1=111111= 

40 AENIFQAYKTVLQDKTYMIFMGANIATTFIIMQFDNFLPvHLSNSFKTITFWGFEIYGQRMLTIYLILACVLVVLLMTTL 
210 220 230 240 250 260 270 280 

1200 1230 1260 1290 1320 1350 1330 1410 

NKFIENWPLKRQLILGSLICGFGMLFNISIjOT'FGAILIAMTFFTFGEMIYvPASQVLRAEI'IMVEGKIGSYSGFLAIAQPV' 
45 |:: =:| :: :| ||| ||:|: || hi I = I = I I = s | | : | | | : : | | | | || : | || |: 

NRLTKDWSHQKGFIWGSLFMAIGMIFSFLTTTFTPIFIAGIVYTLGEIVYTPSVQTLGADLMNPEKIGSYNGVAAIKMPI 
290 300 310 320 330 340 350 360 

1440 1470 1500 1530 1560 1590 1620 1650 

50 ASVLAGAMVSLSYFTGKIGVQITLTIFMLAGLVLILYATKMKNIEIGK*NVRLY*RKIE*NNG*IYCCGNSWIGIHDICG 
11 = 111 =11 = 1 III = I = = "hi I 

ASILAGLLVSISPMIKAIGVSLVLALTEVLAIILVLVAVNRHQKTKLN 
370 380 390 400 



55 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1083 

A DNA sequence (GBSxll57) was identified in S.agalactiae <SEQ ID 3339> which encodes the amino 
acid sequence <SEQ ID 3340>. This protein is predicted to be holliday junction DNA helicase (ruvA). 
60 Analysis of this protein sequence reveals the following: 

Possible site: 37 
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- 91 ( 74 - 91) 

Final Results 

bacterial membrane Certainty=0 .1702 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04943 GB:AP001511 holliday junction DNA helicase [Bacillus halodurans] 
Identities = 86/201 (42%), Positives = 122/201 (59%), Gape = S/201 (2%) 

Query: 1 MYDYIKGKLSKITAKFIWETAGLGYKIYVANPYSFSGYVNQEVTIYLHQVIRDDAHLLF 60 

M DY++G L+ I ++ WE G+GY +Y NPY F + +TIY Q +R+D L+ 
Sbjct: 1 MIDYLRGTLTDIDHQYAWEVHGVGYQVYCPNPYEFEKERDSVITIYTFQYVREDVIRLY 60 

Query: 61 GFHTENEKEIFLNLISVSGIGPTTALAIIAVDDNEGLVSAIDNSDIKYLTKFPKIGKKTA 120 

GF T+ ++ +F L++VSGIGP ALAI+A E ++ AI+ D +L KFP +GKKTA 
Sbjct: 61 GFRTKEKRSLFEKLLNVSGIGPKGALAILATGQPEHVIQAIEEEDEAFLVKFPGVGKKTA 120 

Query: 121 QQMILDLSGKFVE ASGESATSRKVSSEQNSNLEEAMEALLALGYKATELKKVKA 174 

+Q+ILDL GK E + E ++ N L+EAMEAL ALGY ELKKVK 

Sbjct: 121 RQIILDLKGKVDELHPGLFSQKEEQPKPHEKM3GNQALDEAMEALKALGYVEKELKKVKP 180 

Query: 175 FFEGTNETVEQYIKSSLKMLM 195 

E T + YIK +L++++ 
Sbjct: 181 KLEQETLTTDAYIKKALQLML 201 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3341> which encodes the amino acid 

sequence <SEQ ID 3342>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.59 Transmembrane 75 - 91 ( 74 - 91) 



35 Final Results 

bacterial membrane Certainty=0 . 1638 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

>GP:BAB04943 GB:AP001511 holliday junction DNA helicase [Bacillus halodurans] 
Identities = 91/201 (45%) , Positives = 128/201 (63%) , Gaps = 5/201 (2%) 

Query: 1 MYDYIKGQLTKITAKYIWEANGLGYMINVANPYSFTDSWQLVTIYLHQVIREDAHLLF 60 
45 M DY++G LT I +Y WE +G+GY + NPY F + ++TIY Q +RED L+ 

Sbjct: 1 MIDYLRGTLTDIDHQYAWEVHGVGYQVYCPNPYEFEKERDSVTTIYTFQYVREDVIRLY 60 

Query: 61 GFHTEDEKDVFLKLISVSGIGPTTALAIVAVDDNEGLVNAIDNSDIKYLMKFPKIGKKTA 120 
GF T++++ +F KL++VSGIGP ALAI+A E ++ AI+ D +L+KFP +GKKTA 
50 Sbjct: 61 GFRTKEKRSLFEKLLNVSGIGPKGALAILATGQPEHVIQAIEEEDEAFLVKFPGVGKKTA 120 

Query: 121 QQMVLDLAGKFVEA PQETGHTKARSNKAC-NTQLDEAIEALLALGYKAKELKKIRA 175 

+Q++LDL GK E Q+ K GN LDEA+EAL ALGY KELKK++ 

Sbjct: 121 RQIILDLKGKOTELHPGLFSQKEEQPKPHEKNDGNQALDE^EMKALGYVEKELKKVKP 180 

55 

Query: 176 FFEGTSETAEQYIKSALKLLM 196 

E + T + YIK AL+L++ 
Sbjct: 181 KLEQETLTTDAYIKKALQLML 201 

60 An alignment of the GAS and GBS proteins is shown below. 

Identities = 153/197 (77%), Positives = 176/197 (88%), Gaps = 1/197 (0%) 



Query: 1 MYDYIKGKLSKITAKFIWETAGLGYMIYVANPYSFSGYVNQEVTIYLHQVIRDDAHLLF 60 
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Sbjct: 

Query: 61 GFHTENEKEIFLNLISVSGIGPTTALAIIAVDDNEGLVSAIDNSDIKYLTKPPKIGKIO'A 120 

GFHTE+EK++FL LISVSGIGPTTALAI+AVDDNEGLV+AIDNSDIKYL KFPKIGKKTA 
Sbjct: SI GFHTEDEKDVFLKLISVSGIGPTTaiAIVAVDDNEGLVNAIDNSDIKYLMKFPKIGKKTA 120 

Query: 121 QQMILDLSGKFVEASGESA-TSRKVSSEQNSNLEEAMEALliALGYKATELKKVKAFFEGT 179 

QQM+LDL+GKFVEA E+ T + + N+ L+EA+EALLALGYKA ELKK++AFFEGT 
Sbjct: 121 QQMVLDLAGKFVEAPQETGHTKARSNKAGNTQLDEAIEALLALGYKAKELKKIRAFFEGT 180 

Query: 180 NETVEQYIKSSLKMLMK 196 

+ET EQYIKS+LK+LMK 
Sbjct: 181 SETAEQYI KSALKLLMK 197 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1084 

A DNA sequence (GBSxll59) was identified in S.agalactiae <SEQ ID 3343> which encodes the amino 
acid sequence <SEQ ID 3344>. This protein is predicted to be DNA-3-methyladenine glycosidase I (tag). 
Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 2812 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10263> which encodes amino acid sequence <SEQ ID 
10264> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC76573 GB:AE000432 3 -methyl-adenine DNA glycosylase I, 
constitutive [Escherichia coli K12] 
identities = 87/176 (49%) , Positives = 122/176 (68%) , Gaps = 1/176 (0%) 

Query: 5 MKKCSWvNLDNPLYVAYHDKEWGRAVHDDHVLFELLCLETYQSGLSWETVIJJI^RQEFRQV 64 

M+RC WV+ D PLY+AYHD EWG D LFE++CLE Q+GLSW TVL KR+ +R 
Sbjct: 1 MERCGWVSQD - PLYIAYHDNEWGVPETDSKKLFEMI CLEGQQAGLSWITVLKKRENYRAC 59 

Query: 65 FHHYNIEKVAAMSDADLEIILQNPRVIRHRLKLFSTRQNARSIILIQKEFGSFDRYIWSF 124 

FH ++ KVAAM + D+E ++Q+ +1RHR K+ + NAR+ + +++ F ++WSF 
Sbjct: 60 FHQFDPVKVAAMQEEDVERLVQDAGIIRHRGKICAIIGNARAYLQMEQNGEPFVDFVWSF 119 

Query: 125 TONK^VNSVIOTYHDVPASTTLSERLSKD^KKRGFKFVGPTCLYSFIQAAGMVNDH 180 

V+++ QV +++P ST+ S+ LSK LICKRGFKFVG T YSF+QA G+VNDH 

Sbjct: 120 VNHQPQVTQATTLSEIPTSTSASDALSKALKKRGFKFVGTTICYSFMQACGLVNDH 175 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3345> which encodes the amino acid 
sequence <SEQ ID 3346>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 



- Final Results 

bacterial cytoplasm Certainty=0 .4149 (Affirmative) • 

bacterial membrane Certair.ty=0 . 0000 (Not Clear) < i 

bacterial outside — Certainty=0 . 0000 (Not Clear) < i 
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An alignment of the GAS and GBS proteins is shown helow. 

Identities = 114/184 (61%) , Positives = 135/184 (72%) 

FHMKRCSWWILDNPIjWAYHDKEWGRAVHDDHVLFELLCLETYQSGLSWETVLNKRQEFR 62 
FHMKRCSWV DN LY YHD EWG+ + DD FELLCLE+YQSGLSW TVL KRQ FR 
FHMKRCSWVPKDNQLYCDYHDLEWGQPLDDDRDFFELLCLES YQSGLSWLTVLKKRQaFR 6 1 

QVFHHYNIEKVAAMSDADLEIILQNPRVIRHRLKLFSTRQNARSIILIQKEFGSFDRYIW 122 
VFHHY+I VA + +4- L+NP +IRH+LKL +T NA ++ IQKEFGSF Y+W 



+FV K N VN N VPA T LS RL+KDLKKRGFKF+GPT +YSF+QA+G+VNDHE 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1085 

A DNA sequence (GBSxll60) was identified in S.agalactiae <SEQ ID 3347> which encodes the amino 
acid sequence <SEQ ID 3348>. This protein is predicted to be competence-damage inducible protein 
(cinA). Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have an uncleavable N-term signal seq 







Sbj ct : 


2 




63 


Sbjct: 


62 




123 


Sbjct: 


122 


Query: 


183 


Sbjct: 


182 



Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10261> which encodes amino acid sequence <SEQ ID 
10262> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA84071 GB:Z34303 CinA protein [Streptococcus pneumoniae] 
Identities = 194/297 (65%), Positives = 236/297 (79%), Gaps = 1/297 (0%) 



L+D YGYG+ S+A V+E LK Q KTI AAESLTAGLFQ+ +A FSG 



S +F GGF TYS+E KS++L IP K L+E+GWS FTA+ MA+QAR ++DFGI LTG 
VSSIFEGGFVTYSLEEKSRMLDIPAKNLEEEGWSEFTAQKMAEQARSKTQSDFGISLTG 359 





1 


Sbjct: 


121 




61 


Sbjct: 




Query: 


121 


Sbjct: 


240 


Query: 


181 


Sbjct: 


300 




241 


Sbjct: 


360 



VAGPD LEG+P GTVFIG+A +G IKV+IGG+SR+DVRHI+ +HAF+LVR+ALL 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3349> which encodes the amino acid 
sequence <SEQ ID 3350>. Analysis of this protein sequence reveals the following: 
Possible site: 22 

:errainal signal sequence 

134 - 150 ( 134 - 150) 

• Final Results 

bacterial membrane Certainty=0. 1765 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

15 >GP:CAA84071 GB:Z34303 CinA protein [Streptococcus pneumoniae] 

Identities = 286/417 (68%) , Positives = 336/417 (79%) , Gaps = 1/417 (0%) 

Query: 1 MKAELIAVGTEILTGQIVNTNAQFLSEKMAELGIDVYFQTAVGDNEERLLSVITTASQRS 60 
MKAE+IAVGTEILTGQIVNTNAQFLSEK+AE+G+DVYFQTAVGDNE RLLS++ ASQRS 
20 Sbjct: 1 MKAEIIAVGTEILTGQIVNTNAQFLSEKLAEIGVDVYFQTAVGDNEVRLLSLLEIASQRS 60 

Query: 61 NLVILCGGLGPTKDDLTKQTIAKYLRKDLVYDEQACQKLDDFFAKRKPSSRTPNNERQAQ 120 

+LVIL GGLG T+DDLTKQTLAK+L K LV+D QA +KLD FFA R +RTPNNERQAQ 
Sbjct: 61 SLVILTGGLGATEDDLTKQTIAKFLGKALVFDPQAQEKLDIFFALRPDYARTPNNERQAQ 120 

25 

Query: 121 VlEGSIPLPNKTGIAVGGFITVDGISYVVLPGPPSELKPMVNEELVPLLSKQySTLYSKV 180 

++EG+IPLPN+TGLAVGG + VDG++YWLPGPPSELKPMV +L+P L S LYS+V 
Sbjct: 121 IVEGAIPLPNETGLAVGGKLEvDGOTYVVLPGPPSELKPMVLNQLLPKLMTG-SKLYSRV 179 

30 Query: 181 LRFFGIGESQLvTVLSDFIENQTDPTIAPYAKTGEvTLRLSTKTENQALADKKLGQLEAQ 240 

LRFFGIGESQLVT+L+D I+NQ DPT+APYAKTGEVTLRLSTK +Q A++ L LE Q 
Sbjct: 180 LRFFGIGESQLOTIIADLlDNQIDPTIAPYAKTGEvTLRLSTKASSQEEANQALDILENQ 239 



Query: 241 LLSRKTLEGQPLADVFYGYGEDNSLARETFELLVKYDKTITAAESLTAGLFQSTLASFPG 300 

+L +T EG L D YGYGE+ SLA E L + KTI AAESLTAGLFQ+T+A+F G 
Sbjct: 240 ILDCQTFEGISLRDFCYGYGEETSIASIVVEELKRQGKTIAAAESLTAGLFQATVANFSG 299 

Query: 301 ASQVFNGGFVTYSMEEKAKMLGLPLEELKSHGWSAYTAEGMAEQARLLTGADIGVSLTG 360 

S +F GGFVTYS+EEK++ML +P + L+ HGWS +TA+ MAEQAR T +D G+SLTG 
Sbjct: 300 VSSIFEGGFVTYSLEEKSRMLDIPAKNLEEHGWSEFTAQKMAEQARSKTQSDFGISLTG 359 



Query: 361 VAGPDMLEEQPAGTVFIGLATQNKVESIKVLISGRSRLDVRYIATLHAFNMVRKTLL 417 

VAGPD LE P GTVFIGLA E IKY I GRSR DVR+IA +HAFN+VRK LL 

Sbjct: 360 VAGPDSLEGHPVGTVFIGLAQDQGTEVIKVNIGGRSRADVRHIAVMHAFNLVRKALL 416 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 201/299 (67%) , Positives = 242/299 (80%) 



Query: 1 MVEGSIPLQNLTGLAVGGIVTSKGVQWWLPGPPSELKPMVMEQWPILSNWGTKLYSRV 60 

++EGSIPL N TGIoAVGG +T G+ Y+VLPGPPSELKPMV E++VP+LS + LYS+V 
Sbjct: 121 VIEGSIPLPNKTGI^VGGFITVDGISYWLPGPPSELKPMVNEELVPLLSKQYSTLYSKV 180 

Query: 61 LRFFGIGESQL VT I LED 1 1 KNQTDPT IAP YAKVGE VTLRLSTKAENQDEADFKLDSLEKE 120 

LRFFGIGESQLVT+L D I+NQTDPTIAPYAK GEVTLRLSTIC ENQ AD KL LE + 
Sbjct: 181 LRFFGIGESQLVTVLSDFIENQTDPTIAPYAKTGEVTLRLSTKTENQALADKKLGQLEAQ 240 

Query: 121 ILALKTLDNRKLKDLLYGYGDNNSMARTVLELLKVQNKTITAAESLTAGLFQSQL2ffiFSG 180 

+L+ KTL+ + L D+ YGYG++NS+AR ELL +KT1TAAESLTAGLFQS LA F G 
Sbjct: 241 LLSRKTLEGQPLADVFYGYGEDNSLARETFELLVKYDKTITAAESLTAGLFQSTIASFPG 300 

Query: 181 ASQVFNGGFTTYSMEAKSQLLGIPKKKLQEYGWSHFTAEAMAQQARQLLKADFGIGLTG 240 

ASQVFKGGF TYSME K+4-+LG+P ++L+ +GWS +TAE MA+QAR L AD G+ LTG 
Sbjct: 301 ASQVFMGGFVTYSMEEKAKMLGLPLEELKSHGWSAYTAEGMAEQARLLTGADIGVSLTG 360 
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Query: 241 VAGPDELEGYPAGTVFIGIATPEGYSSIKVSIGGKSRSD^/RHISTLHAFDLVRRALLKI 299 

VAGPD LE PAGTVFIG+AT V SIKV I G+SR DVR+ 1 +TLHAF+ +VR+ LLK+ 
Sbjct: 361 VAGPDMLEEQPAGWFIGLATQNKAffiSrKVLISGRSRLDWYIATLHAFNIWRKTLLKL 419 

5 SEQ ID 3348 (GBS646) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 131 (lane 2-4; MW 61.6kDa), in Figure 134 (lane 3; MW 57.5kDa + lanes 2 & 4; 
MW 27kDa). It was also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 131 (lane 5-7; MW 36.6kDa) and in Figure 178 (lane 5; MW 37kDa). 

GBS646-His was purified as shown in Figure 229, lane 5. 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1086 

A DNA sequence (GBSxll61) was identified in S.agalactiae <SEQ ID 3351> which encodes the amino 
acid sequence <SEQ ID 3352>. Analysis of this protein sequence reveals the following: 

15 Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.37 Transmembrane 148 - 164 ( 148 - 164) 

Final Results 

20 bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3353> which encodes the amino acid 

25 sequence <SEQ ID 3354>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.37 Transmembrane 148 - 164 ( 148 - 164) 

30 Final Results 

bacterial membrane --- Certainty=0 . 1150 (Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the databases: 

>GP:AAD04860 GB:AF069745 RecA protein [Streptococcus parasanguinis] 
Identities = 333/381 (87%), Positives = 356/381 (93%), Gaps = 3/381 (0%) 



Query: 


1 


+AKK KK ++ITKKFGDER KAL+DALK IEKDFGKG++MRLGERAEQKVQVMSSGSLAL 


60 


Sbjct: 


1 


MAKKQKKLDDITKKFGDEREKALNDALKLIEI^FGKGSIMRICERAEQKVQVMSSGSIAL 


60 


Query: 




DIALGAGGYPKGRIIEI-iTGPESSGKTTVALHAVAQAQKEGGIAAFIDAEHALDPAYAAAL 


120 






DIALGAGGYPKGRIIEIYGPESSGKTTVALHAVAQAQKEGGIAAFIDAEHALDP+YAAAL 




Sbjct: 




DIALGAGGYPKGRIIEIYGPESSGKTTVALHAVAQAQKEGGIAAFIDAEHALDPSYAAAL 


120 




121 


GVNIDELLLSQPDSGEQGLEIAGKLIDSGAVDLWVDSVAALVPRAEIDGDIGDSHVGLQ 


180 






GVNIDELLLSQPDSGEQGLEIAGKLIDSGAvDLVWDSVAALVPRAEIDGDIGDSHVGLQ 




Sbjct: 


121 


GVNIDELLLSQPDSGEQGLEIAGKLIDSGAVDLWVDSVARLVPRAEIDGDIGDSHVGLQ 


180 






ARmSQAMRKLSASINKTKTIAIFINQLREWGVMFGNPETTPGGRALKFYASVRLDVRG 


240 






ARMMSQAMRKL ASINKTKTIAI FINQLREKVGVMFGNPETTPGGRALKFYASVRLDVRG 




Sbjct: 


181 


ARMMSQAMRKLGASINKTKTIAIFINQLREKVGVMFGNPETTPGGRALKFYASVRLDVRG 


240 



55 



Query: 241 TTQIKGTGDQKDSSIGKETKIIO/^/KNKVAPPFKVAEVEIMYGEGISRTGELVKIASDLDI 300 
TQIKGTGDQKD+++GKETKIKWKNKV7APPFK A VEIMYGEGISRTGELVKIA+DLDI 
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Sbjct: 241 1WQIKGTGDQKDT1WGKETKIKV\™CTA??FKEPJvIVBIMYGSGISRTGELVKIATDLDI 300 

Query: 301 IQKAGAWFSYNGEKIGQGSENAKRYLADHPELFDEIDLKVRVKFGLLEESEEESAMAVAS 360 

IQKAGAW+SYNGEKIGQGSENAK++LADHPE+FDEID KVRV FGL+E+ E ++ 
Sbjct: 301 IQKAGAWYSYNGEKIGQGSENAKKFLADHPEIFDEIDHKVRVHFGLIEKDEAVKSLDKTE 360 

Query: 361 EE- - 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 339/379 (89%) , Positives = 356/379 (93%) , Gaps = 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 




121 


Sb j ct : 


121 




181 


Sbj ct : 


181 






Sb j ct : 


241 




301 


Sbjct: 


301 




361 


Sbjct: 


360 



DIALGAGGYPKGRI+EIYGPESSGKTTVALHAVAQAQKEGGIAAFIDAEHALDPAYAAAL 



IQKAGAW+SYNGEKIGQGSENAK+YLAD+P +FDEID KVRV FG+ I 



E DDL LDLDN IEIE+ 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1087 

A DNA sequence (GBSxll62) was identified in S.agalactiae <SEQ ID 3355> which encodes the amino 
acid sequence <SEQ ID 3356>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2344 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10259> which encodes amino acid sequence <SEQ ID 
10260> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG37358 GB:AF028804 NrpR [Lactococcus lactis subsp. cremoris] 
Identities = 69/132 (52%) , Positives = 102/132 (77%) 
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. Query: 5 MIKIYTISSCTSCKKAKTWLNAHQLPYKEQNLGKESLTRDEILEILTKTESC3IESIVSSK 64 
MI IYT SCTSCKKAKTWL+ H +P+ E+NL + L+ EI +IL K + G+E ++SS+ 
Sbjct: 1 MITIYTAPSCTSCKKAKTWLSYHHIPENERNBIADPLSTTEISQILQKCDDGVEGLISSR 60 

Query: 65 1TOYAKALNCNIEELSVNEVIDLIQENPRILKSPILIDDKRLQVGYKEDDIRAFLPRSIRH 124 

NR+ K L + E++S+++ I +1 ENP+I++ PI++D+KRL VGY E++IRAFLPR++R 
Sbjct: 61 NRFVKTLGVDFEDISLSQAIKI ISENPQIMRRPI IMDEKRLHVGYNEEEIRAFLPRTVRV 120 

Query: 125 VENAEARLRAAL 136 

+EN ARLR+A+ 
Sbjct: 121 LENGGARLRSAI 132 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3357> which encodes the amino acid 
sequence <SEQ ID 3358>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2569 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 117/132 (88%) , Positives = 128/132 (96%) 

Query: 5 MIKIYTISSCTSCKKAKTWLNAHQLPYKEQNLGKESLTRDEILEILTKTESGIESIVSSK 64 

MIKIYTISSCTSCKKAKTWLNAH+L YKEQNLGKE LT++EIL IL+KTE+G+ESIVSSK 
Sbjct: 1 MIKIYTISSCTSCKKAKTWI^AHKLRYKEQNLGKEPLTKEEILAILSICrENGVESIVSSK 60 

Query: 65 NRYAKALNCNIEELSVNEVIDLIQENPRILKSPILIDDKRLQVGYKEDDIRAFLPRSIRN 124 

NRYAKAL+C+IEELSV+EVIDLIQ+NPRILKSPILIDDKRLQVGYKEDDIRAFLPRSIRN 
Sbjct: 61 NRYAKALDCDIEELSVSEVIDLIQDNPRILKSPILIDDKRLQVGYKEDDIRAFLPRSIRN 120 

Query: 125 VENAEARLRAAL 136 

+EN EARLPAAL 
Sbjct: 121 IENTEARLRAAL 132 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1088 

A DNA sequence (GBSxll63) was identified in S.agalactiae <SEQ ID 3359> which encodes the amino 
acid sequence <SEQ ID 3360>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
45 ?» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3097 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04987 GB:AP001511 unknown [Bacillus halodurans] 
Identities = 49/82 (59%), Positives = 64/82 (77%), Gaps = 1/82 (1%) 

55 

Query: 1 MGFTDETVRFRLDDSN-KVEISETLTAVYRSLEEKGYNPINQIVGYVLSGDPAYVPRYND 59 

M D T++F +++ V++ E L +VY +LEEKGYNPINQIVGY+LSGDPAY+PR+ D 
Sbjct: 1 MSSMDNTMKFNVNEEPVSVTJVQEVLMSVYEALEEKGYNPINQIVGYLLSGDPAYIPRHKD 60 
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Query: 60 ARNQIRKYERDEIVEELVRYYL 81 

AR IRK ERDE++EELV+ YIi 
Sbjct: 61 ARTL I RKLERDELI EELVKSYL 82 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3361> which encodes the amino acid 
sequence <SEQ ID 3362>. Analysis of this protein sequence reveals the following: 

5 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 097 (Affirmative) c suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 80/83 (90%) , Positives = 85/88 (95%) 
Query: 



MGFTDETVRF+LDD +K +ISETLTAVY SL+EKGYNPINQIVGYVLSGDPAYVPRYNDA 
Sbjct: 1 MGFTDETVRFKLDDGDI<RQISETLTAVYHSLDEKGYNPINQIVGYVLSGDPAYVPRYNDA 60 

Query: 61 RNQIRKYERDEIVEELVRYYLQGNGIDL 88 

RNQIRKYERDEIVEELVRYYLQGNGID+ 
Sbjct: 61 RNQIRKYERDEIVEELVRYYLQGNGIDV 88 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1089 

A DNA sequence (GBSxll64) was identified in S.agalactiae <SEQ ID 3363> which encodes the amino 
acid sequence <SEQ ID 3364>. Analysis of this protein sequence reveals the following: 

;erminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .1575 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10257> which encodes amino acid sequence <SEQ ID 
10258> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Sbjct: 

Query: 61 LPKNMNNTSGPRVEASQAYGDKITELFl , nLPVEYQDERLTTVQAERMLVEQADISRGKRKK 120 

PKNMN T GPR EASQ + + +N+PV DERLTT+ AE+ML+ AD+SR KRKK 
Sbjct: 61 FPKMMNGTVGPRGFASQTFAKVLETTYOTPV-^WDERLTTMAAEKMLI-AADVSRQKRKK 119 



Query: 121 VIDKLAAQLILQNYLDRM 138 

V1DK+AA +ILQ YLD + 
Sbjct: 120 VIDKMAAVMILQGYLDSL 137 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3365> which encodes the amino acid 
sequence <SEQ ID 3366>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1575 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 114/139 (82%), Positives = 126/139 (90%) 

Query: 1 MRIMGLDVGSKTVGVAISDPLGFTAQGLEIIKIDEESGNFGFDRIAELVKEYKVDKFWG 60 

MRIMGLDVGSKTVGVAISDPLGFTAQGLEIIKIDEE FGF RL ELVK+Y+ V+ + FV+G 
Sbjct: 1 MR1MGLDVGSKTVGVAISDPLGFTAQGLEIIKIDEEKAKFGFTRLEELVKQYQVEQFVIG 60 

Query: 61 LPKISWINNTSGPRvFaSQAYGDKITELFNLPVEYQDERLTWQAERMLVECADISRGKRKK 120 

LPKNMNNT+GPRV+AS YG+ I LF LPV YQD3RLTTV+A+RML+EQADISRGKRKK 
Sbjct: 61 LPKfMNNTNGPRVDASITYGmlEHLFGLPVHYQDERLTTVEAKRMLIEQADISRGKRKK 120 

Query: 121 VIDKLAAQLILQNYLDRMF 139 

VIDKLAAQLILQNYL+R F 
Sbjct: 121 VI DKIAAQL I LQNYLNRNF 139 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1090 

A DNA sequence (GBSxll65) was identified in S.agalactiae <SEQ ID 3367> which encodes the amino 
acid sequence <SEQ ID 3368>. Analysis of this protein sequence reveals the following: 
Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2631 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14697 GB:Z99118 yrzB [Bacillus subtilis] 
Identities = 50/94 (53%) , Positives = 65/94 (68%) , Gaps = 5/94 (5%) 

Query: 12 EHQHEVITLVDENGNETLFEILLTIDGREEFGKNYVLLVPAGAEEDEQGEIEIQAYSFTE 71 

EH + IT+VD+ GNE L E+L T + EEFGK+YVL P +++DE E+EI A SFT 
Sbjct: 2 EHGEJOJITI VDDQGNEQLCEVLFTFEN-EEFGKSYVLYYPIESKDDE- -EVEILASSFTP 58 

Query: 72 NADGTEGDLQPIPEDSDAEWDMIEEVFNSFLDEE 105 

N DG G+L PI ++D EWDMIEE N+FL +E 
Sbjct: 59 NEDGENGELFPI - -ETDEEWDMIEETLNTFL&DE 90 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3369> which encodes the amino acid 
sequence <SEQ ID 3370>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 3170 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) ■ 

bacterial outside Certainty=0. 0000 (Not Clear) ■ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 90/98 (91%) , Positives = 94/98 (95%) 

Query: 7 HDHNHEHQHEVITLVDENGNETLFE I LLT I DGREEFGKNYVLLVPAGAEEDEQGEIE I QA 66 
H+H ++HQHEVITLVDE GNETLFE I LLTI DGREE FGKNYVLLVPAG+EEDE GEIEIQA 

Sbjct: 3 
Query: 67 

YSFTEN DGTEGDLQPI PEDSDAEWDMIEEVFNSFLDE 
Sbjct: 63 YSFTENEDGTEGDLQPIPEDSDAEWDMIEEVFNSFIJDE 100 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 1091 

A DNA sequence (GBSxll66) was identified in S.agalactiae <SEQ ID 3371> which encodes the amino 
acid sequence <SEQ ID 3372>. Analysis of this protein sequence reveals the following: 



J- terminal signal sequence 



■ Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



Certainty=0. 2059 (Affirmative) • 
•- Certainty=0. 0000 (Not Clear) < i 
■- Certainty=0. 0000 (Not Clear) < s 



The protein has no significant homology with any sequences in the GENPEPT 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1092 

A DNA sequence (GBSxll67) was identified in S.agalactiae <SEQ ID 3373> which encodes the amino 
acid sequence <SEQ ID 3374>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 

Possible site: 53 



> Seems to have no N- terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = -9 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



Transmembrane 314 • 
279 ■ 



Transmembrane 136 

Transmembrane 232 

Transmembrane 163 

Transmembrane 95 

Transmembrane 386 

Transmembrane 204 

Transmembrane 40 - 56 ( 40 - 57 

Transmembrane 186 - 202 ( 182 - 202 



330 ( 308 - 

295 ( 274 - 

152 ( 135 - 

248 ( 226 - 

179 ( 162 - 

111 ( 94 - 

402 ( 386 - 

220 ( 204 - 



• Final Results - 

bacterial n 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 4673 (Affirmative! 
Certainty=0 . 0000 (Not Clear) ■ 

• Certainty=0. 0000 (Not Clear) • 
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A related GBS nucleic acid sequence <SEQ ID 10255> which encodes amino acid sequence <SEQ ID 
10256> was also identified. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3375> which encodes the amino acid 
sequence <SEQ ID 3376>. Analysis of this protein sequence reveals the following: 

5 Possible site: 53 





have no N- terminal signal sequence 










INTEGRAL 


Likelihood = 


-7.38 


Transmembrane 


315 


- 331 


311 


333) 


INTEGRAL 


Likelihood = 


-6.48 


Transmembrane 


40 


- 56 


37 


61) 


INTEGRAL 


Likelihood = 


-6.10 


Transmembrane 


278 


- 294 


274 


298) 


INTEGRAL 


Likelihood = 


-5.57 


Transmembrane 


392 


- 408 


387 


410) 


INTEGRAL 


Likelihood = 


-3.98 


Transmembrane 


186 


- 202 




208) 


INTEGRAL 


Likelihood = 


-3.93 


Transmembrane 


339 


- 355 


338 


356) 


INTEGRAL 


Likelihood = 


-2.97 


Transmembrane 


235 


- 251 


228 


253) 


INTEGRAL 


Likelihood = 


-2.44 


Transmembrane 


166 


- 182 


166 


182) 


INTEGRAL 


Likelihood = 


-2.23 


Transmembrane 


106 


- 122 


106 


125) 


INTEGRAL 


Likelihood = 


-1.81 


Transmembrane 


83 


- 99 


83 


101) 



• Final Results - 

bacterial ti 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 3951 (Affirmative) . 
Certainty=0. 0000 (Not Clear) < I 
Certainty=0. 0000 (Not Clear) < i 



A related sequence was also identified in GAS <SEQ ID 9179> which encodes the amino acid sequence 
<SEQ ID 9180>. Analysis of this protein sequence reveals the following: 



Possible cleavage site: 13 












Seems to have a cleavable N-term signal seq. 










INTEGRAL Likelihood = -7.38 


Transmembrane 


243 - 


259 


( 239 


- 261) 


INTEGRAL Likelihood = -6.10 


Transmembrane 


206 - 


222 


( 202 


- 226) 


INTEGRAL Likelihood = -5.57 


Transmembrane 


320 - 


336 


( 315 


- 338) 


INTEGRAL Likelihood = -3.98 


Transmembrane 


114 - 


13 0 


( 112 


- 136) 


INTEGRAL Likelihood = -3.93 


Transmembrane 


267 - 


283 


( 266 


- 284) 


INTEGRAL Likelihood = -2.97 


Transmembrane 


163 - 


179 


( 156 


- 181) 


INTEGRAL Likelihood = -2.44 


Transmembrane 


94 - 


110 


( 94 


- 110) 


INTEGRAL Likelihood = -2.23 


Transmembrane 


34 - 


50 


( 34 


- 53) 


-- Final Results 












bacterial membrane -- 


Certainty= 0 . 3 9 


5 (Af f irmat: 






bacterial outside -- 


Certainty=o . 00 


00 (Not 


Clear) < s 





bacterial cytoplasm Cerrainty=0 . 0000 (Not Clear) < suco 

40 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 200/480 (41%) , Positives = 310/480 (63%) , Gaps = 1/480 (0%) 



Query: 40 ILLYSVLSTLLAIANPLLTYFANGLQTQNLYTGLKMTKGQIPYSDVFATGGFLYYVTIAL 99 

+L +S++ + L IA P LT ANGLQ+QNLY G+M+TKGQ+PYS F TGG Y+V IAL 
Sbjct: 40 LLFFSIIISSLTIAVPFLTDAANGLQSQNLYIGMMLTKGQLPYSAAFTTGGLFYFVIIAL 99 

Query: 100 SYLLGSSIWLLIVQFIAYWSGIYFYKLVYYVAQSEIVSIGMTLIFYIMNIVLGFGGMYP 159 

SY LGS+4WL+ VQ +Y+SG+Y YKL+ Y+ + V++ ++ 4Y4444 LGFGG+YP 
Sbjct: 100 SYYLGSTLWLVFVQVFCFYLSGLYLYKLINYMTGFQKVALTFSISYYLLSVSLGFGGLYP 159 

Query: 160 IQWALPFMLISLWFLIKFCVDNIVDEAFIFYGILAAFSLFIDPQTLIFWLCSFVLLTATN 219 

Q A+PF+LIS WFL K+ + DEAFI +G + A ++ IDP TLIFW + V + 4 N 
Sbjct: 160 TQLAMPF I LI SAWFLTKYFACLVKDEAF I L FGFVGALAML I DPSTLI FWS FACVTVFS YN 219 

Query: 220 IKQKQSLRGFYQFLCWFGMILIAYWGYFMFNLQIISSYIDKAIFYPFTYFARTNHSFL 279 

I QK RGFYQ L +FGMIL4 YT GYF4 NLQ+++ Y+ + + YPFT+F N S L 
Sbjct: 220 ISQKHIARGFYQLLASIFGMILVFYTAGYFILNLQ\™PYLSQTMIYPFTFFKSGNLSLL 279 

Query: 280 LSLAIQIVVLLGSGCLFGLWDFIQNRKKASYQIGLNFIACIFIIYAIMAIFSRDFNLYHF 339 

LAIQ4 LG G L G4 4 1+ K S 4+ 4 + 44AIFS4D4 YH 

Sbjct: 280 FGIAIQLFFALGLGLLTGMEWIRRFKNNSDRVVKWLFVMVILESILVAIFSQDYRPYHL 339 
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Query: 340 LPALPFGLLLTSNKITILYQKVIDRRSHRRQY-FSGKSLIVDLPVKKTYYLPLLLVSLSI 398 

LP LPFGL+LT+ + Y + + SHRR++ +G ++ +++K+ +YLP+L+V + 
Sbjct: 340 LPLLPFGLILTAIPVGYQYGIGLGQSSHRRRHGKNGVGRVMMIYLKEHFYLPIL1VGTIL 399 

Query: 399 GLLVYNTYQNVTLSKERRDISHYLTTKIDRDGKIYVWDKVASIYSQTRLKSASQFVLPHI 458 

Y ++ L++ER 1+ YL K+++ IYVWD + IY ++ KS SQF P I 
Sbjct: 400 ICSTYCFISSIPLNQERDHIASYLEQKLNKTQSIYVMDDTSKIYLDSKAKSVSQFSSPDI 459 

10 Query: 459 OTAQKMffiKILKIJELLQHGAKyFILNKNEKLPlIELKSDIKKHYQEVPLSNITHFVLYRFK 518 

NT ++++ KIL+DELL++ A Y ++N+ + LP ++ + +Y+ F++Y+ K 

Sbjct: 460 NTQKESHRKILEDELLENKAAYI WNRYKNLPKI I QKVLSTNYKVDKQITTKSFI VYQKK 519 



A related GBS gene <SEQ ID 8727> and protein <SEQ ID 8728> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
SRCFLG: 0 

McG: Length of UR: 34 

Peak Value of UR: 2.23 

Net Charge of CR: 0 
McG: Discrim Score: 7.72 
GvH; Signal Score (-7.5): -2.21 

Possible site: 60 
»> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 61 
ALOM program count: 5 value: -9.18 threshold: 0.0 



Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
PERIPHERAL Likelihood = 
modified ALOM score: 2.34 
icml HYPID: 7 CFP: 0.457 

*** Reasoning Step: 3 



• 190 ( 168 - 194) 
■ 155 ( 134 - 160) 

• 108 ( 86 - 113) 
- 262 ( 246 - 265) 



Certainty=0. 4673 (Affirmative) 

bacterial outside --- Certainty=0. 0000 (Not Clear) < , 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 

ORF02392(331 - 978 of 1764) 
45 EGAD|43696|MJ1079(2 - 379 of 397) conserved hypothetical protein {Methanococcus jannaschii} 

OMNl|MJ1079 conserved hypothetical protein GP | 1591727 | gb|AAB99075 . 1 | |U67550 conserved 

hypothetical protein {Methanococcus jannaschii} PIR|F64434 | F64434 hypothetical protein 

MJ1079 - Methanococcus jannaschii 

%Match =3.1 
50 %Identity =25.6 %Similarity =50.7 

Matches = 57 Mismatches = 100 Conservative Sub.s = 56 



*LLIANI*LSVHPTSFFTXXXN*LXXSSIWLLIVQFIAYWSGIYFYKLVYWAQSEIVSIGMTLIFYIMNIVLG 

: |:: |: |: | 
MLNLLYL I LGI I CGT I TGL 



426 447 477 507 537 567 597 

60 FGGMYPIQW-ALPFMLISLWFL- - -IKFCVDNIVDEAFIFYGILARFSLFIDPQTLIFWLCSFVLLTATNIKQKQSLRGF 

I |::| II |:» ' ! I I < > < I I > I ' I I I <l > I I 1=111 
FPGIHPNNIVALSFLILPYFGLDNYIPFLIGLVITBKFINF-IPSAFLGTODDETAVSALPMHKLTLNGNGYEAIVLAGF 
30 40 50 60 70 80 90 



65 



627 657 687 717 747 774 
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YQFLCWFGMILIAYWGYFMFmQIISSYIDKAIFYPFTYFARTKESFLLSLai-QIVVLLGSGC 

:| III ::==:= |::: I II II I : = : :::|:: II 

GSYLGWFSILISLFLMSIBHFDVKAFYCSI--KIFIPFILIAFILYQIFTAKSVWEVLVIFLSGIFGIAVLYCSEAFNI 



- - -LFGLWDFIQNRKKASYQ - IGLNFIACIFI 

= lh :| I I : = = h= II 

TLTAIFTGMFGIPLLINNLKTYKIKSQMMAFPDFELKFLKSSFFA TIAIIILUvILSKYIIjLFIRKVNFKFLSLFFI 



906 948 978 1008 1038 1068 1098 

IYAIMAIFSRDFN LYH---FLPALPFGLLLTSNKITILYQKVIDRRSHRRQYFSGKSLIVDLFVKKTYYLPLLI1VSL 

|: : : :| :|| :| |, ||| : : 

I FCSbWI igsyntyli YHI I VYLTAI YIGLLAVKSNTNLSNMMNVL I FPTILYFLRG 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1093 

A DNA sequence (GBSxll68) was identified in S.agalactiae <SEQ ID 3377> which encodes the amino 
acid sequence <SEQ ID 3378>. This protein is predicted to be anaerobic ribonucleotide reductase (nrdD). 
Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty- 0 . 3722 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10253> which encodes amino acid sequence <SEQ ID 
10254> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD00215 GB:U73336 anaerobic ribonucleotide reductase 
[Lactococcus lactis subsp. cremoris] 
Identities = 539/725 (74%) , Positives = 616/725 (84%) , Gaps = 7/725 (0%) 

Query: 10 MTESDIKVIKRDGRLVSFDKYKIYTALLKASNKVIKMSPLVEAKLEMIADHVIAEIYNRF 69 

+T +1 VIKRDGR V F+ KI+ Ah KA+ KV V L + D V++EI++RF 
Sbjct: 10 VTLEEINVIKRDGRSVKFNSEKIFDALTKAAKKVELTDKSV LSELTDRWSEIFSRF 66 

Query: 70 KDNIKIYEIQNIVEHKLDEANEYAIAQEYINYRTQRDFERSQATDINFSIGKLINKDQTV 129 

+N+KIYEIQ+IVE +LLE+ E A+A+EYI+YR RD R++ATDINF+I KLIN+DQTV 
Sbjct: 67 SENVKIYEIQSIVEQELLESGETALAEEYISYFJiNRDl 1 ARTKATDlNFTIEKLINRDQTV 126 

Query: 130 VNENANKDSDVFNTQRDLTAGIVGKSIGLKMLPSHVANAHQKGDIHYHDLDYSPYTPMTN 189 

VNENANKDS+VFNTQRDLTAG V K+IGLK+LP HVANAHQKGDIHYHDLDYSP+T M N 
Sbjct: 127 VNENANKDSNVFNTQRDLTAGAVSKAIGLKLLPPHVANAHQKGDIHYHDLDYSPFTTMAN 186 

Query: 190 CCLIDFKGMIiANGFKIGNAEVESPKSIQTATAQISQIIANVASSQYGGCTADRIDEFLAP 249 

CCLIDFK M NGFK+GNA+V+SPKSIQTATAQ SQIIANVASSQYGGC+ DR DE LAP 
Sbjct: 187 CCLIDFKlSMFFJ!IGFKLGNAQvnSPKSIQTATAQASQIIANVASSQYGGCSFDRADEVLAP 246 

Query: 250 YAQLNYQKHLKDAKEWIED-KQEDYARAKTQKDIYDAMQSLEYEINTLFTSNGQTPFTS 308 

YA+LNYQKHIJKDA++W+ D K+E YAR KT KDIYDAMQSLEYEINTLFTSNGQTPF + 
Sbjct: 247 YAKIiNYQKHLKDAQKWIDGDEKRFAYAREKTAKDIYDAMQSLEYEINTLFTSNGQTPFVT 306 

Query: 309 LGFGLGm^FEREIQKAILKIRIQGLGSEHRTAIFPKLIFTLKKGIiNLEEDSPNYDIKQL 368 
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Query: 369 ALECATKRMYPDVLSYDKI IDLTGS FKAPMGCRS FZiQGWRDANGQDVTSGRMNLGWTVN 428 

ALEC+TKRMYPD+LSYDKI++LTGSFKA MGCRSFliQGW+DANG DVT+GR NLGWTVN 

Sbjct: 367 ALECSTKRKYPDILSYDKIVELTGSFKaSMGCRSFLQGWKDAMGNDVTAGRl^LGVVTVlI 426 

Query: 429 LPRVAMESNGDMDKFWEIFNERMSIARDALVYRVERVKEAIPANAPILYQYGAFGERLGK 488 

LPR+A+E+ G+ +KFWEIFNER+ IA DAIi +RVER KEA P NAPIL+ GA G RL 

Sbjct: 427 LPRI ALEAAGNKEKFWEI FNERVE IAHDALAFRVERAKEAQPKNAPI LFMNGALG - RLDS 485 

Query: 489 YDNVDRLFNHRRATVSLGYIGLYEVASVFYGGDWEDNHQAKAFTVDITOKMKQLCADWSD 548 

+VD L+N+ RATVSLGYIGLYEVA+ FYG WE N +AKAFT++IV++M + C DWS 

Sbjct: 486 EGSVDDLYWNERATVSLGYIGLYEVATTFYGPTWESMPEAiCAFTIEIVKRMHEDCEDWSK 545 

Query: 549 EYDYHFSVYSTPSESLTDRFCRLDTEKFGIVTDITDKEYYTNSFHYDVRKNPTPFEKLDF 608 

YH+SVYSTPSESLTDRFCR+D EKFG V DITDK+YYTNSFHYDVRKNPTPFEKL+F 

Sbjct: 546 ASGYHYSVYSTPSESLTDRFCRMDKEKFGSVADITDKDYYINSFHYDVRKNPTPFEKLEF 605 

Query: 609 EKIYPETGASGGFIHYCEYPVLQQNPKALEAVWDYAYDRVGYLGTNTPIDKCYQCQFEGD 668 



Query: 669 FTPTDRGFTCPNCGNSDPKTVDWKRTCGYLGNPQARPMVNGRHKEI SARVKHMNGS - SI 727 

FTPT+RGF CP CGN DPKT DWKRTCGYLGNPQARPMV+GRHKEI S+RVKHMNGS 
Sbjct: 665 FTPTERGFKCPQCGNDDPKTCDWKRTCGYLGNPQARPMVHGRHKEISSRVKHMNGSVGA 724 

Query: 728 KNQGN 732 
N GN 

Sbjct: 725 LNDGN 729 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3379> which encodes the amino acid 
sequence <SEQ ID 3380>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2975 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 641/731 (87%) , Positives = 680/731 (92%) 

Query: 1 MMVLERERFMTESDI KVT KRDGRLVS FDKYKI YTALLKASNKVI KMS PL VEAKLEMIADH 60 

M+ LE ++ + DIKVIKRDGRLV+FD KIY+ALLKAS KV +MSPLVEAKLE I+D 
Sbjct: 1 IWSLEEDKVWQPDIKVIKRDGRLWFDSTKIYSALLKASMKVTRMSPLvEAECLEAISDR 60 

Query: 61 VIAEIYNRFKDNIKIYEIQNIVEHKLLEANEYAIAQEYINYRTQRDFERSQATDINFSIG 120 

+ IAEI RF NIKIYEIQNIVEHKLTj ANEYAIA+EYINYRTQRDF RSQATDINFSI 
Sbjct: 61 IIAEIIERFPTNIKIYEIQNIVEHKLLAANEYAIAKEYINYRTQRDFARSQATDINFSID 120 

Query: 121 KLINKDQTVVNENANKDSDVFNTQRDLTAGIVGKSIGLKMLPSHVANAHQKGDIHYHDLD 180 

KLINKDQTWNENANKDSDVFNTQRDLTAGIVGKSIGLKMLPSHVANAHQKGDIHYHDLD 
Sbjct: 121 KIiINKDQTvVNENANKTJSDVFNTQRDLTAGIVGKSIGLKMLPSHVANAHQKGDIHYHDDD 180 

Query: 181 YSPYTPMTNCCLIDFKGMLANGFKIGNAEVESPKSIQTATAQISQIIANVASSQYGGCTA 240 

YSPYTPMTNCCLIDFKGMLANGFKIGNAEVESPKSIQTATAQISQIIANVASSQYGGCTA 
Sbjct: 181 YSPYTPMTNCCLIDFKGMLANGFKIGNAETOSPKSIQTATAQISQIIANVASSQYGGCTA 240 

Query: 241 DRIDEFIAPYAQIOTQKHLKDAKEWIEDKQEDYARAKTQKDIYDAMQSLEYEINTLFTS 300 

DRIDEFLAPYA+LN++KH+ DAK+W++E K+E YA KTQKDIYDAMQSLEYEINTLFTS 
Sbjct: 241 DRIDEFLAPYAELNFKKHMADAKKWIWTKRESYAFEKTQKDIYDAMQSLEYEINTLFTS 3 00 

Query: 301 NGQTPFTSLGFGLGTNWFEREIQKAILKIRIQGLGSEHRTAIFPKLIFTLKKGLNLEEDS 360 
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NGQTPFTSLGFGLGT+WFEREIQKAIL IRI GLGSEHRTAIFPKLIFT+K+GENIiE DS 
Sbjct: 301 NGQTPFTSLGFGLGTSWFEREIQKAILTIRI1IGLGSEHRTAIFPKLIFTVKRGLNLEPDS 360 

Query: 361 PNYDIKQIiALECATKRRyPDVLSYDKllDLTGSFKAPffiCRSFLQGWRDANGQDVTSGRM 420 

PNYDIK LALECaTKRMYPD+LSYDKIIDLTGSFK+PMGCRSFLQGW+D NGQDVTSGRM 
Sbjct: 351 PNYDIKTLALECATKRMYPDMLSYDKIIDLTGSFKSPMGCRSFLQGWKDENGQDVTSGRM 420 

Query: 421 NLGWTVMjPRVAMESNGDtClKFWEIFNERMSIARDMjVyRVERVKEAIPjySIAPILYQYG 480 

NLGWT+NLPR+AMESNGDMDKFWE+FNERM I++DAL+YRVERV EA PANAPILYQYG 
Sbjct: 421 ISniGVVTmLPRIAMESNGDMDKF^LFIffiRMLISKDALIYRVERVTEAKPANAPILYQYG 480 

Query: 481 AFGERLGKYDNVDRLFMaRRATVSLGYIGLYEVASVFYGGDWEDNHQAKAFTVDlVRKMK 540 

AFG+RL K NV+ LF +RRATVSLGYIGLYEVASVFYGG WE N AKAFT+ IV+ MK 
Sbjct: 481 AFGKRLEKTGNVNDLFKNRRATVSLGYIGLYEVASVFYGGQWEGNPDAKAFTLSIVKAMK 540 

Query: 541 QLCADWSDEYDYHFSVYSTPSESLTDRFCRLDTEKFG1VTDITDKEYYTNSFHYDVRKNP 600 

Q C DWSDEY YHFSVYSTPSESLTDRFCRLDTEKFGIVTDITDKEYYTNSFHYDVRK+P 
Sbjct: 541 QACEDWSDEYGYHFSVYSTPSESLTDRFCRLDTEKFGIVTDITDKEYYTNSFHYDVRKSP 600 

Query: 601 TPFEKLDFEKIYPETGASGGFIHYCEYPVLQQNPKALEAVWDYAYDRVGYLGTNTPIDKC 660 

TPFEKLDFEK YPE GASGGFIHYCEYPVLQQNPXALEAVWDYAYDRVGYLGTNTPIDKC 
Sbjct: 601 TPFEKLDFEKDYPFAGASGGFIHYCEYPVLQQNPKALEAVWDYAYDRVGYLGTNTP1DKC 660 

Y CQFEGDFTPT+RGFTCPNCGN+DPKTVDWKRTCGYLGNPQARPMVNGRHKEISARVK 
Sbjct: 661 YNCQFEGDFTPTERGFTCPNCGNNDPKTVDWKRTCGYLGNPQARPMVNGRHKEISARVK 720 

Query: 721 HMNGSSIKNQG 731 

HMNGS+IK G 
Sbjct: 721 HMNGSTIKYPG 731 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1094 

A DNA sequence (GBSxll69) was identified in S.agalactiae <SEQ ID 3381> which encodes the amino 
acid sequence <SEQ ID 3382>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5372 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3383> which encodes the amino acid 
sequence <SEQ ID 3384>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 6084 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 28/47 (59%) , Positives = 40/47 (84%) , Gaps = 1/47 (2%) 
Query: 1 MGKYQLDYKGQAQVQKFHEKHSTGENANQKSRLKDIiRKQFIiEKAJCKK 47 
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MGKYQLDYKG QV++FHEKHS + ++KSR+++L+ +FLEK+KK+ 
Sbjct: 1 MGKYQLDYKGMQQVERFHEKHSK-KKTDKKSRVQELKARFLEKSKKQ 46 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 1095 

A DNA sequence (GBSxll70) was identified in S.agalactiae <SEQ ID 3385> which encodes the amino 
acid sequence <SEQ ID 3386>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0436 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

15 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB95794 GB:AL359949 putative oxidoreductase [Streptomyces 
coelicolor A3 (2) ] 

20 Identities = 91/299 (30%), Positives = 147/299 (48%), Gaps = 7/299 (2%) 



Query: 2 LQLGIVGLGGISQKAYr.PYMRQVTGVHWHLFTRQKQILEEV--NMLFGSSTAYDSLDSLA 59 

+++G +GLG I+QK YLP ++G+HLTR LV + + + LD+L 
Sbjct: 1 MKVGCIGLGDIAQKGYLPVLAALPGIELHLOTRTPATLTRVADKLRIPPAQRHADLDArjI. 60 

Query: 60 EHPLDGVFIHVATSAHFDIAKLFLKKGIPVFMDKPLTEDYTSTKALYDLAKDHKTFLMAG 119 

LD F+H T+AH +1 L+ G+P ++DKPL + ++ L LA++ T h G 
Sbjct: 61 AQGLDAAFVHAPTAAHPEIVTRLLEAGVPTYVDKPLAYELSJ3SERLVTLAEERGTSLAVG 120 

Query: 120 FNRRFAPRIMEMKKVEDKiraiRTFKNAVN^ADFQYKI.FDMFIHPLDTALFLlTJNWKRG 179 

FNRR AP + + +1 KN P D + + D FIH +DT FL V 
Sbjct: 121 FNRRHAPGYAQCAE-HPRELILMQKNRTGLPEDPRTMILDDFIHVVDTLRFLVPGPVDDV 179 

Sbjct: 

Query: 240 DGFDRRAI-GFGSWASTLEKRGFEPMIDAFIQAITTGVNPISPKSSLLSHFICDQINKA 297 

D + + G W +RG E 4- AF+ A+ +G +S + +L +H +C+++ +A 

Sbjct: 238 DHKGQPTVRRRGDWVP V7ARQRGI EQAVLAFLDAVRSG - EVLSARDALATHELCERWRA 295 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3387> which encodes the amino acid 

sequence <SEQ ID 3388>. Analysis of this protein sequence reveals the following: 

Possible site: 57 
45 >» Seems to have an uncleavable N-term signal seg 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAF96942 GB:AE004430 oxidoreductase, Gfo/ldh/MocA family [Vibrio cholerae] 
Identities = 103/304 (33%) , Positives = 158/304 (51%) , Gaps = 11/304 (3%) 

Query: 4 IMIGIVGLGAISQKAYLPYMRQLSDITSfflLSTRNAaVRQQVGQLFGHAILYSDVKELSKT 63 

+ I ++GLG I+QKAYLP + Q DI L TRN V + + + +D +++ + 
Sbjct: 1 MKIAMIGLGDIAQKAYLPVIAQWPDIELVLCTRNPKVLGTIATRYRVSATCTDYRDVLQY 60 
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Query: 




Sb j ct : 


61 


Query: 


124 


Sb j Ct : 


121 


Query: 


178 


Sbjct: 


181 


Query: 


238 


Sbjct: 


241 


Query: 


293 


Sb j Ct : 


301 



+FD FIHPLD+ 



H YH+ GLL+++ V T ASMN Q G E + 



L +GF+ M+ +L+ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 168/308 (54%), Positives = 223/308 (71%) 



LDGVFIH ATSAH ++A LFL +GIPVFMDKP+ ++Y TK r,YDT J AK+++TFLMAGF 



NRRF PR+ ++ + K + KN +N P D +KLFD FIHPLDTALFLT 





1 


Sbjct: 


3 




61 


Sbjct: 


63 




121 


Sbjct: 


123 




181 


Sbjct: 


183 




241 


Sbjct: 


243 


Query: 


301 


Sbjct: 





h QV VTL T+S ASMNLQSGSRRE++E4-+ E TY L++L LS+ 



G ++R +GF SW +TL KRGFE MIDAF++AI+TGVNP+SP+SSLLSH+IC QI 



SEQ ID 3386 (GBS309) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 68 (lane 10; MW 63kDa). 

50 GBS309-GST was purified as shown in Figure 212, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1096 

A DNA sequence (GBSxll71) was identified in S.agcdactiae <SEQ ID 3389> which encodes the amino 
55 acid sequence <SEQ ID 3390>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 
- Final Results 
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bacterial cytoplasm Certainty=0 . 2983 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04222 GB:AP001508 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 52/129 (40%) , Positives = 70/129 (53%) , Gaps = 5/129 (3%) 

Query: 39 FEDWLDHNLNMELGVGVPDNFVPYIQFVSFDNDKNAIGFLNLRDRLNDTLLEKGGHIGYS 98 

FE L + + GV +P N V + IG +N+R LND L +GGHIGY 

Sbjct: 43 FEHLLKTLKDYQHGVNLPANRVANTTYVEjVHEQKRLIGAINIRHTLNDWLHHRGGHIGYG 102 

Query: 99 IRPRQRGKGYAKEQLKLGIEQAHLKNINEILVTCHVDNDASKSVILANGGVIEDCIjHQ-- 156 

IRP +RGKGYA LKLG+E+A + ++L+TC +N S I NGGVL+ + 
Sbjct: 103 1RPSERGKGYATLMLKLGLEKAAALGLEKVLITCDKENLPSARTIQRNGGVLDSEVVDER 162 

Query: 157 TERYWI 162 

+RYWI 

Sbjct: 163 GIAIQRYWI 171 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3391> which encodes the amino acid 
sequence <SEQ ID 3392>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=0 .2195 (Affirmative) ■ 

bacterial membrane Certainty=0. 0000 (Not Clear) < i 

bacterial outside --- Certainty=0. 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 90/164 (54%) , Positives = 115/164 (69%) , Gaps = 4/164 (2%) 

Query: 1 MKLPJ^PVLEDKEEIIAMKEFQKESSSVDG--GFYEPTMHFEDWLDHNM^IELGVGVPDN 58 

M++RRP L+DK+ +L+M EF ++ S+ DG F ++E WL+ +L E+G+ 

Sbjct: 1 MEIRRPTLKDKDAVLSMINEFLEQKSATDGLVWFNVNDFNYETWLEDSLRQEMGLS--SQ 58 

Query: 59 FVPYIQFVSFDNDNNAIGFLNLRLRIjNDTLLEKGGHIGYSIRPRQRGKGYAKEQLKLGIE 118 

VP IQ+V+FD + AIGFLNLRLRUSM- llekgghigys+rp qrgkgyake lk + 
Sbjct: 59 GVPAIQYVAFDERSQAIGFLNLRLRLNERLLEKGGHIGYSVRPSQRGKGYAKEMLKQAVS 118 

Query: 119 QAHLKNINEILVTCHVDNDASKSVILANGC-VLEDCLHQTERYWI 162 

A KNI ILVTC N AS4+VI+AN G+LED TERYWI 
Sbjct: 119 YAISKNITTILVTCDETNVASRAVIVANVGILEDSRGGTERYWI 162 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1097 

A DNA sequence (GBSxll72) was identified in S.agalactiae <SEQ ID 3393> which encodes the amino 
acid sequence <SEQ ID 3394>. This protein is predicted to be anaerobic ribonucleotide reductase activator 
protein (nrdG). Analysis of this protein sequence reveals the following: 

io N- terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=D. 4239 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD00216 GB:U73336 anaerobic ribonucleotide reductase activator 
5 protein [Lactococcus lactis subsp. cremoris] 

Identities = 152/198 (76%) , Positives = 176/198 (88%) 



Query: 8 NTPKPGEWKSEELSHGHIIDYKAFNFVEGEGVRNSLYVAGCMFHCKGCYNTATWSFRAGI 67 
N PKPGEW+ + +ELS +1 DYK FNFVDGEGVR SLYV+GCMFHC+GCYN ATWSFR G 
10 Sbjct: 2 NNPKPGEWRADELSQNYIADYKPFNFVDGEGVRCSLYVSGCMFHCEGCYNQATWSFRYGR 61 





68 


PYTKELEDQIMTDLEQPYVQGLTLLGGEPFLNTGILLPLLQRIRRELPEKDIWSWTGYTW 


127 






PYTKELED+IM DL +PYVQGLTLLGGEPFLNT L+PLL+RIRRELP+KDIWSWTGYTW 




Sbjct: 


62 


PYTKELEDKI^^IIAEPYVCGLTLLGGEPFLOT'TFLIPI 1 I I KRIRRELPDKDIWSWTGYTW 


121 


Query: 


128 


EEMMLETQDKLEMLSLIDILVDGRFDQSKRNLMLQFRGSSNQRIIDVQKSLKEGEWIWE 


187 






EEMMLET DKliEML L+D+LVDGRF+ SK+NLMLQFRGSSNQRIIDV KS +G+WIWE 




Sbjct: 


122 


EEM^LETDDKLEMLDLI£3VliVDGRFELSKKNLMLQFRGSSNQRIIDVPKSRSKGQVVIWE 


181 






GLNDGDNSYEQVKRDDLL 205 








LNDG+N++EQ+ ++ L+ 




Sbjct: 


182 


KLNDGENNFEQIHKEKLI 199 





A related DNA sequence was identified in S.pyogenes <SEQ ID 3395> which encodes the amino acid 
25 sequence <SEQ ID 3396>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 



Final Results 

30 bacterial cytoplasm Certainty=0 .4111 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

35 Identities = 167/202 (82%) , Positives = 186/202 (91%) 



Query: 4 EASWNTPKPGEWKSEELSHGHIIDYKAFNFVDGEGVRNSLYVAGCMFHCKGCYNTATWSF 63 

E WN PKP EW++EELS G IIDYKAFNFVDGEGVRNSLYV+GC+FHCKGCYN ATWSF 
Sbjct: 4 . EKMNNPKPKEWQAEELSQGRIIDYKAFNFTOGEGVRNSLYVSGCLFHCKGCYNAATWSF 63 

Query: 64 RAGIPYTKELEDQIMTDLEQPYVQGLTLLGGEPFnNTGILLPLLQRIRRELPEKDIWSWT 123 

+AG+PYT+ELE+QIMTDL QPYVQGLTLLGGEPFLNTGIL+PL++RIRRELPEKDIWSWT 
Sbjct: 64 KAGMPYTQELEEQIMTDLAQPYVQGLTLLGGEPFLNTGILIPLIKRIRRELPEKDIWSWT 123 

Query: 124 GYTWEEPWILETQDKLEMLSLIDIDVDGRFDQSKRNLMLQFRGSSNQRIIDVQKSLKEGEV 183 

GYTWEEMMLET DKLEMLSLIDILVDGRFD +K4NLMLQFRGSSNQRI IDVQKSL EV 
Sbjct: 124 GYTWEE^LETPDKLEMLSLIDILVDGRFDITKKNLMLQFRGSSNQRIIDVQKSLAAKEV 183 

Query: 184 VIWEGLNDGDNSYEQVKRDDLL 205 

+ IW+ LNDGD ++EQ+ R+DLL 
Sbjct: 184 I IWDKLNDGDQTFEQISREDLL 205 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1098 

A DNA sequence (GBSxll73) was identified in S.agalactiae <SEQ ID 3397> which encodes the amino 
acid sequence <SEQ ID 3398>. Analysis of this protein sequence reveals the following: 

\f-terminal signal sequence 



WO 02/34771 



PCT/GB01/04789 



INTEGRAL Likelihood = -3.03 Transmembrane 102 - 118 ( 101 - 119) 

Final Results 

bacterial membrane Certainty=0 .2211 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD2444S GB:AF118389 unknown [Streptococcus suis] 
Identities = 97/240 (40%) , Positives = 151/240 (62%) , Gaps = 1/240 (0%) 

Query: 2 IKILIPrAKEMKV-CQNIAWPKLSAQTKIIIDYFSTLTVBDLEDIYRINTSAARCEAQRW 60 

+KI+IP AKE+ +N ++ LS ++K ++D S V + Y++N + A EA RW 
Sbjct: 1 MKIIIPNAKEVOTNLENASFYLLSDRSKPVLDAISQFDVKKMAAFYKLNEAKAELEADRW 60 

Query: 61 QDFKAKQLTLNPAIKLFNGLMYRNIKRHNL3TSEAQFMENSVFITSALYGIIPAMTLISP 120 

+ Q PA +L++GLMYR + R + + E ++ + V + +ALYG+I ISP 
Sbjct: 61 YRIRTGQAKTYPAWQLYDGLMYRYMDRRGIDSKEENYLRDHVRVATALYGLIHPFEFISP 120 

Query: 121 HRLDFNTKIKINNNSLKVFWRENYDTFMQSDDIKVSLLSNEFETVFSPKERQKLIHLNFI 180 

HRLDF +KI N SLK +WR YD + D++4+SL S+EFE VFSP+ +++L+ + F+ 
Sbjct: 121 HRLDFQGSLKIGNQSLKQYWRPYYDQEVGDDEL1LSLASSEFEQVFSPQIQKRLVKILFM 180 

Query: 181 EDRDGQLKTHSTISKKARGKCLTAMMEMNCQTLEHLKQLRFDGFCYDNELSDSKQLTFVK 240 

E++ GQLK HSTISKK RG+ L+ + +NN Q L ++ + DGF Y S + QLTF++ 
Sbjct: 181 EEKAGQLKVHSTISKKGRGRLLSWLAKNNIQELSDIQDFKVDGFEYCTSESTANQLTFIR 240 

A related GBS nucleic acid sequence <SEQ ID 10941> which encodes amino acid sequence <SEQ ID 
10942> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3399> which encodes the amino acid 
sequence <SEQ ID 3400>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certalnty=0 . 3759 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 114/242 (47%) , Positives = 155/242 (63%) 

Query: 1 MIKILIPTAKEMKVCQNIAWPKLSAQTKIIIDYFSTLTVSDLEDIYRINTSAARCEAQRW 60 

M+ LIPTAKEM + + L ++ 1+ + +T DL YRI +A+ E QRW 

Sbjct: 1 MLTFLIPTAKEMTIPKESHPHLLPQDSQAILKIMAAMTTEDLAKSYRIKEESAKKEQQRW 60 

Query: 61 QDFKAKQLTLNPAI KLFNGLMYRNI KRHNLSTSEAQFMENSVFI TSALYGI I PAMTLI SP 120 

QD ++Q PA +LFNGLMYR+IKR L+T E ++ V+ITS+ YGIIPA 1+ 
Sbjct: 61 QDMASQQSLAYPAYQLFNGLMYRHIKRDKLTTQEQAYIiTQQvYITSSFYGIIPANHPIAE 120 

Query: 121 HRLDFNTKIKINNNSLKVFi»JRENYDT?MQSDDIMVSLLSNEFETVFSPKERQKLIHLNFI 180 

HR DF+T+IKI SLK +WR Y+ F + ++SLLS+EF+ VFS +Q I F+ 
Sbjct: 121 HRHDFHTRIKIEGQSLKSYWRPCYNQFAKEHPQVISIiLSSEFDDVFSKDCKQLWISPKFM 180 

Query: 181 EDRDGQLKTHSTISKKARGKCLTA^I^ENNCQTLEHLKQLRFDGFCYDNELSDSKQLTFVKKQ 242 

+++GQ KTHSTISKKARG LTA MENNCQT++ LK L F GF Y +LS + ++KK+ 
Sbjct: 181 AEKEGQFCTHSTISIOCARGiAFLTACMEtMC^^ 242 
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SEQ ID 3398 (GBS428) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 80 (lane 6; MW 30.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 173 (lane 4; MW 55kDa). 

GBS428-GST was purified as shown in Figure 220, lane 6-7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1099 

A DNA sequence (GBSxll74) was identified in S.agalactiae <SEQ ID 3401> which encodes the amino 
acid sequence <SEQ ID 3402>. Analysis of this protein sequence reveals the following: 
Possible site: 23 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.S9 Transmembrane 3 - 19 ( 3-19) 

Final Results 

bacterial membrane Certainty=0 . 1235 (Affirmative) < suco 

bacterial outside --- Gertainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10251> which encodes amino acid sequence <SEQ ID 
10252> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 17 MSYPYKANHSIESITLKVNDLENLWFYSDIIGLTVIDKSSTRALLGVNQKIPLIILEKT 76 

M + + N ++ + +KV+DL + FY +IIG V+++S A L N + PL+++E+ 
Sbjct: 1 MEFHRQPNTFTOLWIKVSDLSRALTFYQEIIGFQVLERSERSATLTANGRTPLLVIEQP 60 

Query: 77 E LEKHSTYGLYHTAILVPDEYHLSLALNHLLSQHIPLEGGADHGYSNAIYLSDPEGN 133 

+ ++ T GLYH A+L+P L LNHLL PL+G +DH S AIY +DP+GN 
Sbjct: 61 DPVIAKQPRTTGLYHFALLLPSRADLGRFLNHLLQSGYPLQGASDHLVSEAIYFADPDGN 120 

Query: 134 GIEIYNDKDISMWDIRESGQIIGITERLDIDNLLDSLVNVPNNYKLSEKTSIGHIHLSVK 193 

G+E+Y D+ S WD +G++ TE + +NLL + P L +T +GHIHL V 
Sbjct: 121 GVEVYADRPSSSWD-WSNGEVKMSTEPIHAENLLAEGKDEPWT-ALPPETILGHIHLHVA 178 

Query: 194 DAKISSKLYQNVFGLDEKFAIPT-ASWIASGNYHHHIAFNNWAGPNLSKNQEDRPGISLL 252 

+ + Y G + + A +I++GNYHHH+ N W G E G+ 

Sbjct: 179 NLFEAETFYIEGLGFNWARLGNQALFISTGNYHHHIGLNTWNGVGAPTPPEHSVGLKWF 238 

Query: 253 TIAYNDDNLFRDSLKKAQLYQLTFLEKQDHYYIIE 287 

++ Y + + ++ + + K ++I+ 

Sbjct: 239 SLTYPSEEVRAKTVNRLETIGFQVERKHGEEWID 273 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3403> which encodes the amino acid 
sequence <SEQ ID 3404>. Analysis of this protein sequence reveals the following: 

7 N- terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 0936 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 143/282 (50%) , Positives = 194/282 (68%) 

Query: 17 MSyPYKMffiSIESITLKVlffiLENLWFYSDIIGLTVIDKSSTRMjLGVNQKIPIjllLEKT 76 
5 M YPY + S+ V DL + FY+ IIGL V+ + +T L + K ++ L +T 

Sbjct: 1 MIYPYNSTISLGWSIjNVTDLAI*1TTFYTSIIGLQVIjSQDTTSRQLTTDGKTVILELRQT 60 

Query: 77 ELEKHSTYGLYHTAILVPDEYHLSLALNHLLSQHIPLEGGADHGYSNAIYLSDPEGNGIE 136 
L YGLYHTA LVPD 4 L L LNH L++ I LEG ADHG+S AIYLSDPEGNGIE 

10 Sbjct: 61 PLPGDKAYGLYHTAFLVPDRHSLGLVLNHFLTRSISLEGAZSDHGHSEAIYLSDPEGNGIE 120 

Query: 137 IYNDKDISMWDIRESGQIIGITERLDIDNLLDSLWVPNNYKLSEKTSIGHIHLSVKDAK 196 

IY+DK + WDIR++GQIIG+TE D ++L+ L ++P ++ L++ T I H+HLSVK+A 
Sbjct: 121 IYHDKAVEHWDIRDNGQIIGVTEPTDTKSILEQLTDIPKHFLLAQDTRIRHVHLSVKNAL 180 

15 

Query: 197 ISSKLYQNVFGLDEKFAIPTASWIASGNYHHHLAFNNWAGPNLSKNQEDRPGISLLTIAY 256 

SS LYQ VF L +K IP4ASWIASGNY+HHLAFN+W+ P L K+QE PG++ LTI 
Sbjct: 181 ASSLLYQKVFDLGDKMTIPSASWIASGNYYHHLAFNHl'JSAPYLKKHQEGAPGLAFLTIHI 240 

20 Query: 257 NDDNLFRDSLKKAOLYQLTFLEKQDHYYIIEDFDGIRIKVVL 298 

LF +LKKA+L+ L L+4 4 ED +GIR+ V+L 
Sbjct: 241 ETPLLFSATLKKARLHGLAILQEDSSSFTTEDEEGIRVNVTL 282 

SEQ ID 3402 (GBS429) was expressed in E.coli as a His-fosion product. SDS-PAGE analysis of total cell 
25 extract is shown in Figure 80 (lane 7; MW 34.2kDa). 

GBS429-His was purified as shown in Figure 214, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1100 

30 A DNA sequence (GBSxll75) was identified in S.agalactiae <SEQ ID 3405> which encodes the amino 
acid sequence <SEQ ID 3406>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2362 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10249> which encodes amino acid sequence <SEQ ID 
10250> was also identified. 

The protein has homology with the following sequences in the GENPEPT 



Query: 10 MVRLIFSDIDGTLINSNFKVTPKTRQGIKQIVAQGATFVPISARMPEAITPIMEQIGIDS 69 

M + +FSD +GTL+ S ++P4-T IK++ A G FVPISAR PIP +Q+ ++ 
Sbjct: 2 MYKAVFSDFNGTLLTSQHTISPRTWVIKRLTANGIPFVPISARSPLGILPYWKQLETNN 61 

50 Query: 70 YIISYNGALIQDMQQKTIASHTMDGQVALQVCSYVSKHYSKIAWNVYRYHEWYSCDKENE 129 

+++++GALI + + I S +4-4- L++ + +44H + NY ++ ++ D EN+ 
Sbjct: 62 VXVAFSGAIiIMQNLEPIYSVQIEPKDILEINTVLAEH-PLLGV^ 120 

Query: 130 WVQKEEEIVGLQSKEMSLMELEKQDRIHKLLLMGEP3LMGELENTLKAQYPHLSIAQSAP 189 
55 WV E + ++ + HK4 ++GE + E+E LK ++PHLSI +S 

Sbjct: 121 WVIYERSVTKIEIHPFDEVATRSP HKIQIXGEAEEIIEIEVLLKEECFPHLSICRSHA 177 
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Query: 190 YFIEIMAPGIEKGKSAKTLMYLDISLADSIAFGDNYNDtNLLEIVGKGFVMGNAPKDLQ 249 

F+E+M KG + + h DY + + IAFGDN+NDL++LE VG G MGNAP +++ 

Sbjct: 178 NFLEVMHKSATKGSAWPLEDYFGVQTNEVIAFGDNFMDLDMLEHVGLGVAMGNAPNEIK 237 

Query: 250 ERIGNVTQDNDNDGIYYALVE 270 

+ VT N+ DG+ L E 
Sbjct: 238 QAANWTATNNEDGLALILEE 258 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1101 

A DNA sequence (GBSxll76) was identified in S.agalactiae <SEQ ID 3409> which encodes the amino 
acid sequence <SEQ ID 3410>. Analysis of this protein sequence reveals the following: 

15 Possible site: 19 

>» May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



VEHP+L A R G + L + GY+DGKN+K +Y++AQG+ 





33 


Sbjct: 


31 


Query: 


93 


Sbjct: 


91 




152 


Sbjct: 


151 






Sbjct: 


211 




272 


Sbjct: 


271 



+++GIATP+AQ+L + PI+F+ VTDPV A L S + 



Q+ L++KV+P KR+G++Y E NS V VK+ K++ 



QT R V+ILKG+ ++ E NL++ VN A++ G+ +S 



There is also homology to SEQ ID 2712. 

SEQ ID 3410 (GBS188) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 39 (lane 2; MW 36.6kDa). 

The GBS188-His fusion product was purified (Figure 204, lane 6) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 247), FACS, and in the in vivo passive protection 
assay (Table III). These tests confirm that the protein is immunoaccessible on GBS bacteria and that it is an 
effective protective immunogen. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1102 

A DNA sequence (GBSxll77) was identified in S.agalactiae <SEQ ID 341 1> which encodes the amino 
acid sequence <SEQ ID 3412>. This protein is predicted to be probable permease of ABC transporter 
(rbsC). Analysis of this protein sequence reveals the following: 



Possible site: 21 



3 to have a cleavable N-t 
IAL Likelihood =-16 
Likelihood = -6 
Likelihood = -6 
Likelihood = -6 
Likelihood = -4 
Likelihood = -1 
Likelihood = -0 



Transmembrane 
Transmembrane 
Transmembrane 



264 - 280 



Transmembrane 
Transmembrane 
Transmembrane 



124 - 160 ; 



- Certainty=0. 7453 (Affirmative) - 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0 . 0000 (Not Clear) < i 



The protein has 



with the following sequences in the GENPEPT database. 



>GP:AAG07224 GB:AE004801 probable permease of ABC transporter 
[Pseudomonas aeruginosa] 
Identities = 114/285 (40%) , Positives = 175/285 (61%) , Gaps = 



Sb j ct : 



ILSGISQGLLWSIMAIGVFITFRILDIADLSAEGAFPMGAAVCALCIVNDINPIVATIAG 64 
+ + GL++S++A+GVET+FR+L DL+ +G+FP+G AVCA I +P AT+A 
LFGALEIGLIFSLVALGVFISFRLLRFPDLTVDGSFPLGGAVCATLIALGWDPYSATLAA 65 



Query: 65 MLGGMLAGLVSGFLHTKMKIPALLTGIITLTGLYSINLLVLGRSNVSFALKNTLVTMVTR 124 

G LAGL +G L+ K+KI LL 1+ + LYSINL ++G+ MV + TL T++ 
Sbjct: 66 TAAGAIAGIATGLLNVKLKIOTLIASILNMIALYSI^RIMGKPNVPLIAEPTLFTLLQP 125 

Query: 125 LGMKLSAVLLIGIVCVGLVILILYLFLNTQLGLAI^RATGDNEAMGQANSIKVDRMKM 184 

L+ h+ + V L+L F TQ GLA+RATG N M +A + M +LG 

Sbjct: 126 EI^SDWFRPLLLVFIVIAAKLLLDWFFTTQKGLAIRATGSNPRNARAQGVNTGGMILLG 185 

Query: 185 YMIGNGLIALSGALIiAQNNGYADrj^GVGTIVIGLASIILAEVMIKYLPLGKRLWSIVLG 244 

I N L+AL+GAL AQ G AD++MG+GTIVIGLA++I+ E ++ h +++LG 
Sbjct: 186 MAI SNALVALAGALFAQTQGGAD I SMGIGT I VIGLAAVI VGES I LPSRRLI LATLAVI LG 245 

Query: 245 SVLYRMI IVFILTTD IDAQMIKLVSAILLALILYVPELRAKL 286 

+++YR I L +D + AQ + LV+A+L+ + L +P ++ +L 
Sbjct: 246 AIVYRFFIALALNSDFIGLQAQDLNLVTAVLVTVALVI PMMKKRL 290 



There is also homology to SEQ ID 2716. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 
vaccines or d 



Example 1103 

A DNA sequence (GBSxll78) was identified in S.agalactiae <SEQ ID 3413> which encodes the amino 
acid sequence <SEQ ID 3414>. This protein is predicted to be ABC transporter. Analysis of this protein 
sequence reveals the following: 



:erminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .3798 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF86640 GB:AF162694 ABC transporter [Enterococcus gallinarum] 
Identities = 171/264 (64%), Positives = 213/264 (79%), Gaps = 1/264 (0%) 

LLELVNLHKTFEKGTVNENBWLRGLDLTIEDGDFISVIGGNGAGKSTLLNCIAGLIPIDQ 62 
+L + +LH+TFEKGT+NENHVLRG+DLT+ GDFI++IGGNGAGKSTLLN IAG IP +Q 
VLTISDLHQTFEKGTINENHVLRGIDLTMNSGDFITI IGGNGAGKSTLLNSIAGTIPTEQ 64 



- IT+ SV +RSK+ISRVFQDPRMGTA LT+EEN+A+A+KRG R F 



R FK+ L++L LGLENR+ T+ LSGGQRQA+TL MATL +PKL+LLDEHTAAL 



Query: 


3 


Sbjct: 


5 




63 


Sb j ct : 


65 




123 


Sbjct: 


124 




103 


Sbjct: 




Query: 


243 


Sbjct: 


244 



TV +LM LFH+NSG +L DD L+L 



There is also homology to SEQ ID 2720: 



Identities 


Query: 


3 


Sb j ct : 


4 




63 


Sbjct: 


64 




123 


Sb j Ct : 


124 




183 


Sb j ct : 


183 


Query: 


243 


Sbjct: 


243 



LLELVNMKTFEKGTVNENHVLRGLDLTIEDGDFISVIGGNGAGKSTLLNCIAGLIPIDQ 62 

++EL+N + G + +L + LTI + DF++++GGNGAGKSTL N IAG + + + 

I IELINATVDVDNGFEDAKTILDNVTLTIYEHDFLTILGGNGAGKSTLFNVIAGTliSLTR 63 



EKR+ +SRVFQD +MGTA +T+ EN+ IA +RG KR + 



h G GLE 4+T A LSGGQRQAL+L MATL +P LLLLDEHTAAL 



DPKTS +M+LT + + + LTALMITH+ME A+ YGNRL+++ G 1+ D+ 



Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1104 

A DNA sequence (GBSxll79) was identified in S.agalactiae <SEQ ID 3415> which encodes the amino 
acid sequence <SEQ ID 341 6>. This protein is predicted to be mannose-specific phosphotransferase system 
component IIAB. Analysis of this protein sequence reveals the following: 

Possible site: 54 
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PCT/GB01/04789 



-1235- 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 527 (Affirmative) < succ: 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD46485 GB:AF130465 mannose-specif ic phosphotransferase system 
component IIAB [Streptococcus salivarius] 
Identities = 287/336 (85%) , Positives = 306/336 (90%) , Gaps = 6/336 (1%) 





Query: 


1 


15 


Sbjct: 


1 






61 


20 


Sbjct: 








121 




Sbjct: 


121 


25 


Query: 


181 




Sbjct: 


175 


30 




241 




Sbjct: 


235 




Query: 


301 


35 


Sbjct: 


295 



60 



NIIKE+K GIKALPEELNP E T A V A P G+IPEGTVIGDGKLKINLAR 



+DTRLLHGQVAT WTPASKA+RI IVASD+V+KDELRK+LIKQAAP GVKANWPI KLI + 



+KDPRFGNT ALILFETVQDALRAIEGGV I ELNVGSMAHSTGKTMVNNVLSMDKDDV 



A FEKLRDLGV FDTOKVPND+KK+LFDKE KANV+ 
ACFEKLRDLGVEFDVRKVPNDSKKDLFDLIKKANVQ 330 

A related DNA sequence was identified in S.pyogenes <SEQ ID 341 7> which encodes the amino acid 
sequence <SEQ ID 3418>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3533 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 288/336 (85%) , Positives = 308/336 (90%) , Gaps = 6/336 (1%) 

Query: 1 MGIGIIIASHGKFAEGIHQSGSMIFGEQEKVQWTFMPNEGPDDLYGHFNNAIAQFDADD 60 

MGIGIIIASHGKFAEGIHQSGSMIFGEQEKVQV\TFI1PNEGPDDLYGHFNNAI QFDADD 
Sbjct: 1 MGIGIIIASHGKFAEGIHQSGSMIFGEQEKVQWTFMPNEGPDDLYGHFNNAIQQFDADD 60 

Query: 61 EVTjVIADLWSGSPFNQASF.VMGENPEFJ<MAIITGLNIjPMIjIQAYTERI#1DANAGVEQVAA 120 

E+LVLADLWSGSPFNQASRV GENP+RKMAIITGMI.PMLIQAYTER+MDA AGVEQVAA 
Sbjct: 61 EILVIADLWSGSPFNQASRVAGENPDRKMAIITGLNLPMLIQAYTERLMDAGAGVEQVAA 120 

Query: 121 NIIKESKEGIKALPEEmPVVEATPVAGvPAOTPAEVKQSGSIPEGTVIGDGKLKINLAR 180 

NIIKESK+GIKALPE+LNPV E V + G+IP GTVIGDGKLKINLAR 

Sbjct: 121 Nil KESKDGI KALPEDLNPVEETAATEKWNAL QGAI PAGTVIGDGKLKINLAR 174 

Query: 181 IDTRLLHGQVATAWTPASKANRIIVASDWSKDEI^KQLIKQAftPGGVKANVVPISKLIE 240 

+DTRLLHGQVATAWTPASKR+RIIVASDW++D+LRKQLIKQAAPGGVKANWPISKLIE 
Sbjct: 175 VDTRIJiHGQVATAWTPASKADRIIVASDErVAQDDLRKQLIKQAaPGGVKANWPISKLIE 234 
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Query: 241 VAKDPRFGNTRALILFETVQDALt^IEGG\rEIPELW3SKAKSTGKTMVlWLSMDKDDV 300 

+KDPRFGNT ALILF4T QDALRA+EGGVEI 3LNVGSMAHSTGKTMVNNVLSMDK+DV 
Sbjct: 235 ASKDPRFGNTHALILFQTPQDALRAVEGG\rEIl^LiWGSKJfflSTGKTMVMNVLSMDKEDV 294 

5 

Query: 301 AAFEICLRDLGVSFDVRKVPNDAKKNLFDLINKANVK 336 

A FEKLRDLGV+FDVRKVPND+KKNL>F+IiI K N+K 
Sbjct: 295 ATFEKLRDLGVTFDVRKVPNDSKKNLFELIQKTNIK 330 

10 Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 



Example 1105 

A DNA sequence (GBSxll80) was identified in S.agalactiae <SEQ ID 3419> which encodes the amino 
acid sequence <SEQ ID 3420>. Analysis of this protein sequence reveals the following: 

15 Possible site: 52 

»> Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3873 (Affirmative) < suco 

20 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06625 GB:AP001517 unknown conserved protein [Bacillus halodurans] 
25 Identities = 89/267 (33%) , Positives = 139/267 (51%) , Gaps = 3/267 (1%) 





3 


Sb j ct : 






63 


Sb j ct : 


64 


Query: 


123 


Sbjct: 


122 




183 


Sbjct: 


182 




243 


Sbjct: 


242 



KKI IAVDLDGTLLHNNNTISDYTADTLRKVQAQGHKVI 1TTGRPYRMALAHYLRLDLKTP 62 
+ + IA+DLDGTLL +N TIS T ' T++K + GH V+I+TGRPYR ++ +Y L L T 
RHLIAI.DLDGTIaLTDNKTISMKTKQTIOKAREAGHIvVISTGRPYRASIQYYQELQLDTA 63 



■ +E +IAFGDE ND EM+ +A G AM NA 



A+ I +NE+DG+A LE+ L 



A related DNA sequence was identified in S. pyogenes <SEQ ID 342 1> which encodes the amino acid 
sequence <SEQ ID 3422>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 43 80 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 .0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 188/270 (69%) , Positives = 224/270 (82%) 
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Query: 1 MTKKIIAVDLDGTLLHNNIOTISDYTADTLRKVQAQGH^ 60 

MTKK+IA+DLDGTLLH++NTIS YT T++ VQ +GH VI I +TGRPYRMAL +YL+L+LK 
Sbjct: 1 MTKKLI AIDLDGTLLHHDNTI STYTQKTI KAVQDKGHEVI I STGRPYRMALGYYLQLNLK 60 

Query. 61 TPMINFNGALTHIPEKKWAFERSATIDKKLLLETU3LSDAIQADFIASEYRKNPYITMDN 120 

TP+I FNGALTH+PE+KWA+E + T+DK Lb L D Q DFIASEYRKN YITM N 
Sbjct: SI TPIITMGALTHMPEQKWAYEHNVTLDKGYLLRLLKYQDDFQMDFIASEYRKNVYITMTN 120 

Query: 121 RDKINPQIjFGVNEITDKMALDVTKITRNPNALLMQTRHKDKYELAKELRQHEHHELEVDS 180 

+ I+PQLFGV+EIT MAL++TKITRNPNALLMQT H+DKY LAIC +R F E+E+DS 
Sbjct: 121 PESIDPQLFGVDEITQDMALEITKITRNPNALLMQTHHEDKYALAKNMRACFKDEIEIDS 180 



Query: 181 W 

WGGPLNILE S K VNKAYAL +LL N+ +++LIAFGDEHNDTEMLAFA TGYAMKNA 
Sbjct: 181 WGGPIiNILEISSKNWKAYALOTLLGIYM^KKDLIAFGDEHM)TEM]^FAGTGYAMKNA 240 

Query: 241 NPTLLPYADQQIQWTNEEDGVAKTLEKLLL 270 

+P LLPYADQQ+ ++NEEDGVAK LE+L L 
Sbjct: 241 SPvLLPYADQQLNFSNEEDGVAKKLEELFL 270 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1106 

A DNA sequence (GBSxll81) was identified in S.agalactiae <SEQ ID 3423> which encodes the amino 
25 acid sequence <SEQ ID 3424>. Analysis of this protein sequence reveals the following: 



Possible site: 39 



30 





have an uncleavable N-terra signal seq 










INTEGRAL 


Likelihood = -7.38 


Transmembrane 


96 


112 




119 


INTEGRAL 


Likelihood = -S.58 


Transmembrane 


28 




27 


47) 


INTEGRAL 


Likelihood = -6.26 




176 


192 


174 


193 


INTEGRAL 


Likelihood = -5.26 


Transmembrane 


127 


143 


126 


144 


INTEGRAL 


Likelihood = -1.59 


Transmembrane 


4 


20 




20) 


INTEGRAL 


Likelihood = -0.22 


Transmembrane 


60 




59 


78 



35 Final Results 

bacterial membrane Certainty=0 .3951 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1107 

45 A DNA sequence (GBSxll82) was identified in S.agalactiae <SEQ ID 3425> which encodes the amino 
acid sequence <SEQ ID 3426>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 2025 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



55 The protein has no significant homology with any sequences in the GENPEPT database. 



WO 02/34771 PCT/GB01/04789 
-1238- 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1108 

A DNA sequence (GBSxll83) was identified in S.agalactiae <SEQ ID 3427> which encodes the amino 
acid sequence <SEQ ID 3428>. This protein is predicted to be an integral membrane protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.41 Transmembrane 180 - 196 ( 179 - 199) 

INTEGRAL Likelihood = -5.31 Transmembrane 96 - 112 ( 94 - 114) 

INTEGRAL Likelihood = -2.18 Transmembrane 129 - 145 { 129 - 145) 

INTEGRAL Likelihood = -1.33 Transmembrane 37 - 53 ( 37 - 53) 



^mbrane Certainty=0. 3166 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 A related GBS nucleic acid sequence <SEQ ID 8729> which encodes amino acid sequence <SEQ ID 8730> 

was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: 5.85 
GvH: Signal Score (-7.5): -2.39 
25 Possible site: 18 

»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 4 value: -5.41 threshold: 0.0 

INTEGRAL Likelihood = -5.41 Transmembrane 176 - 192 ( 175 - 195) 
INTEGRAL Likelihood = -5.31 Transmembrane 92 - 108 ( 90 - 110) 
30 INTEGRAL Likelihood = -2.18 Transmembrane 129 - 145 ( 129 - 145) 

PERIPHERAL Likelihood =0.05 57 
modified ALOM score : 1.58 

*** Reasoning Step: 3 



le Certainty=0. 3166 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. D000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC65028 GB:AE001188 conserved hypothetical integral membrane 
protein [Treponema pallidum] 
Identities = 54/190 (28%), Positives = 93/190 (48%), Gaps = 14/190 (7%) 

Query: 14 LFFIVISFGIKYYHLQG- -PNLIWNMTLALIALDFAYLTSL--FKKKILIGLFALAWFFF 69 

+F +++SFG + L+WN+ LA I ++ + F+ + LWF 

Sbjct: 3 VFCLLLSFGRRCVAADNFLSFL^/WNLVLAFIPVILISAILHVRRFAVRSVQLFLMLLWLLF 62 

Query: 70 YPNTFyMLTDIIHMHFVGDVLYNKTNLILYILYVSSILFGFLSGIESFSVIMRKFRISNI 129 

+PN Y+LTDIIH+ h +IL + + + F+S S++ R F I 

Sbjct: 63 FPNAPYILTDIIHLGKGKSFLLYYDLIILLAYSFTGLFYAFVSLHLIESILARDFHIKRP 122 

Query: 130 FLRWGIIGIVSL-VSSFGIHIGRYARLNSWDILTKPQWTNELLAVPSR DSFHFI 183 

F II + L + +FGI++GR+ R NSWDI+ + +++++ R D++ F+ 

Sbjct: 123 F IISVFELYLCAFGIYLGRFLRWNSWDIVLHGRTILSDIGIRVIRPVFYVDTWMFV 178 



Query: 184 LGFTFLQVLC 193 
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No corresponding DNA sequence was identified in S.pyogenes. 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful 
vaccines or diagnostics. 

Example 1109 

A DNA sequence (GBSxll84) was identified in S.agalactiae <SEQ ID 3429> which encodes the amino 

acid sequence <SEQ ID 3430>. Analysis of this protein sequence reveals the following: 

10 Possible site: 17 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.79 Transmembrane 171 - 187 ( 166 - 191) 

Final Results 

15 bacterial membrane Certainty=0 .3718 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1110 

A DNA sequence (GBSxll85) was identified in S.agalactiae <SEQ ID 3431> which encodes the amino 
25 acid sequence <SEQ ID 3432>. Analysis of this protein sequence reveals the following: 
Possible site: 29 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.46 Transmembrane 193 - 209 ( 191 - 214) 
INTEGRAL Likelihood =-10.: 



INTEGRAL Likelihood = -8 

INTEGRAL Likelihood = -6 

INTEGRAL Likelihood = -6 

INTEGRAL Likelihood = -4 

INTEGRAL Likelihood = -3 

INTEGRAL Likelihood = -2. 

INTEGRAL Likelihood = -1 



Transmembrane 99 - 115 ( 96 - 119) 

Transmembrane 454 - 470 ( 451 - 472) 

Transmembrane 216 - 232 ( 212 - 236) 

Transmembrane 49 - 65 ( 43 - 68) 

Transmembrane 362 - 378 ( 357 - 383) 

5 - 401 ( 385 - 402) 

5 - 291 ( 275 - 291) 

70 Transmembrane 18 - 34 ( 18 - 34) 



Final Results 

bacterial membrane Certainty=0. 5182 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF95422 GB:AE004299 conserved hypothetical protein [Vibrio cholerae] 
Identities = 193/471 (40%) , Positives = 286/471 (59%) , Gaps = 42/471 (8%) 

Query: 1 MEKFFKLKEHGTTIRTEITAGLTTFFAMSYILFVNPAILSQTGMPAQGVFLATIIGAWA 60 

+EK FKL E+GT +RTEI AG+TTF M+YI+FVNPAILS GM VF+AT + A + 
Sbjct: 2 LEKLFKlSEyGTNWTEILAGWTFLTmYIIFVNPAILSDAGMDRGAVFVATCLAAAIG 61 

Query: 61 TSV^^FYANLPYAQAPGMG!JNAFFTYTWFAIJGYTWQESUJA^WFICX3LISLIITLTKVRK 120 

+M F AN P AQAPGMGLNAFFTY W +G+TWQ ALA VF G++ ++++L K+R+ 
Sbjct: 62 CFIMGFIANYPIAQAPGMGLNAFFTYGVVLGMGHTWQVALAAVFCSGVLFILLSLFKIRE 121 
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Query: 121 MIIESIPTTLKSAITAGIGTFLAYVGIKNAGFLKFSIDPGTYDWGKGAAKGIATITANS 180 

II SIP +L++ I+AGIG FLA++ +KNAG + +P T +V GA L + 
Sbjct: 122 WIINSIPHSLRTGISAGIGLFLAFIAIiKtjaGIV--VDNPAT--L,VSLGAITSLHAV 173 

Query: 181 SATPGLVSFDNPAILLSLIGLSITIFFIVKGIRGGIILSILTTTLLGILMGWKLDAINW 240 

L+ +G +TI + +G+-K3 ++++IL T LG++ G V+ 1 
Sbjct: 174 LAAVGFFLTIGLVYRGWGAVMIAILAVTALGLVFGDVQWGGIMS 218 

Query: 241 EATNLSASFRDLKQVFGVALGEKGLISLFSNPSRLPSVLMAILAFSLTDIFDTIGTLIGT 300 

+++ +F Q4- A+ E G+IS+ + AF D+FDT GTL+G 

Sbjct: 219 TPPSIAPTF MQLDFSAVFEIGMISV VFAFLFVDLFDTAGTLVGV 262 

Query: 301 GEKVGILATTGDNHESKSLDKALYSDLIGTTFGAICGTSNVTTYVESAAGIGAGGRTGLT 360 

K G++ G + L++AL +D T+ GA+ GTSN T+Y+ES +G+ GGRTGLT 
Sbjct: 263 ATKAGLIEKDG KIPRLNRALLADSTATSVGALLGTSNTTSYIESVSGVAVGGRTGLT 319 

Query: 361 ALWAGLFAISSFFSPLVSIVPSQATAPILVIVGIMMLSNLKDIKWDDMSEAIPAFFTSL 420 

A+W LF ++ FFSPL ++P+ ATA h V I+M+S L I W D++EA P T L 
Sbjct: 320 AVWGILFLLALFFSPLAGMIPAYATAGALFYVAILMMSGLVSIDWRDLTEAA.PTWTCL 379 

Query: 421 FMGFTYSITYGIAAGFLTYTLAKVIKGQAKDIIWV1WILDILFILNFIS1A 471 

M T+SI GI+ GF+ Y K+ G+ + + + +W++ +F++ +1 A 
Sbjct: 380 MMPLTFSIAEGISLGFIAYAAIKLFSGKGRSVSLSW!VI1AAIFVIKyiLAA 430 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3433> which encodes the amino acid 
sequence <SEQ TD 3434>. Analysis of this protein sequence reveals the following: 



Possible 

INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



have no N-terminal 
Likelihood =- 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



signal sequence 

) Transmembrane 

i Transmembrane 

l Transmembrane 

) Transmembrane 

"> Transmembrane 

j Transmembrane 

I Transmembrane 

i Transmembrane 

i Transmembrane 

) Transmembrane 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0 . 5 528 (Affirmative) < suco 

• Certainty=0. 0000 (Not Clear) < suco 

• Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:BAB04327 GB:AP001S09 unknown conserved protein [Bacillus halodurans] 
Identities = 192/485 (39%) , Positives = 276/485 (56%) , Gaps = 53/485 (10%) 

Query: 1 MEKFFKLSENGTTVSTEIMAGLTTFFAMSYILB'VNPSILGAAGMPSNAVFLATIIAAAIS 60 

M+++F E+GTT E +AGLTTF +M+YILFVMP ILG AGM AVF+AT +AAAI 
Sbjct: 1 MDRYFGFKEHGTTYGRESIAGLTTFLSMAYILF\'NPLILGDAGMDVQAVFMATALAAAIG 60 

Query: 61 TLIMGLFANVPYAIoAPGMGIjNAFFTrrWFALRFSWQEALAMVFICGLFNIFITVTKFRK 120 

TLIMG+ A P ALAPGMGLNAFF Y4-W + WQ AL VF+ G+ I ITV K R+ 
Sbjct: 61 TLIMGIIAKYPIALAPGMGLNAFFAYSWIGMGIDWQLALFGVFVSGIIFILITVFKIRE 120 

Query: 121 SIIKAIPVSLQHAIGGGIGVFVAYIJGFKIS7^IITFSISAENIV^WNGVEPAKASAKTFAD 180 

II AIP L++A GIG+F+A++G KNA 1+ 
Sbjct: 121 VIINAIPAELKNAAAAGIGLFIAFIGLKNAGIW 154 

Query: 181 GLLFVDSNGGWPTISSFTDSGVLLAIFGLLLTTALVIRNFRGAILIGIVATTLVGIPLG 240 

++ ++ + LLA FGL++T ++R +G I G++ T +VG+ G 

Sbjct: 155 SDEATAVSLGHIIiNGPTLIACTCLIVTVLFMVRGIQGGIFYGMILTAIVGLISG 208 
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Query: 241 IVDVSNLNFGISHIGEAWTELGTTFLAAFD-GLSSLFSDSSRLPLVFMTIFAFSLSDTFD 299 

1+ + I L TF AF+ ++ +FS + + F D FD 

Sbjct: 209 IITYTG GGIVSTPPSLAPTFGQAFNIQMADVFSVQ FLIWLTFLFVDFFD 258 

Query: 300 TIGTFIGTGRRTGI FSQDDENALENS IGFSSKMDRALFADAIGTSIGALVGTSHTTTYVE 359 

T GT G + G F +D++ + +AL AD+ TSIGA++GTS TT Y+E 

Sbjct: 259 TAGTLYGVANQAG-FIKDNK LPRAGKALLADSSATSIGAILGTSTTTAYIE 308 

Query: 360 SAAGIAEGGRTGLTAVSTAVCFLLSILLLPLVGIVPAAATAPALIIVGVMMVSSFLDVNW 419 

S+AG+A GGRTG ++ TA F+L++ PL+ +V TA ALI+VG++M SS ++W 
Sbjct: 309 SSAGVAAGGRTGFASIVTAGLFVLAMFFSPLLSVVTEQVTAA&LIWGILMASSLRFIDW 368 

Query: 420 SKFADALPAFFAAFFMALCYSISYGIAAAFIFYCLVKWEGKTKDIHPIIWGATFLFIVN 479 

+K A+P+F M L YSI+ GIA F+FY + +V+G+ K++HPI++ F+F+ 

Sbjct: 369 TKLEIAI PSFLTWAMPLTYS IATGIAFGFLFYPITMIVKGRGKEVHPIMYALFFVFLAY 428 

Query: 480 FIILT 484 
FI L+ 

Sbjct: 429 FIFLS 433 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 258/488 (52%) , Positives = 336/488 (67%) , Gaps = 17/488 (3%) 

Query: 1 MEKFFKLKEHGTTIRTEITAGLTTFFAMSYILFVNPAILSQTGMPAQGVFLATIIGAWA 60 

MEKFFKL E+GTT+ TEI AGLTTFFAMSYILFVNP+IL GMP+ VFLATII A ++ 
Sbjct: 1 MEKFFKLSENGTTVSTEIMAGLTTFFAMSYILFVNPSILGAAGMPSNAVFLATIIAAAIS 60 

' Query: 61 TSVmFYANLPYAQAPGMGIJ^FTYTWFALGYTWQFAIiAMVFICGLISLIITLTKVRK 120 
T +M +AN+PYA APGMGLNAFFTYTWFAL ++WQEALAMVFICGL ++ IT+TK RK 
Sbjct: 61 TLIMGLFAWPYAIAPGMGIMAFFTY1WFALRFSWQEALAMVFICGLFNIFITVTKFRK 120 

Query: 121 MIIESIPTTLKSAITAGIGTFLAYVGIKNAGFLKFSIDPGTYDW— --GKGAAK 171 

II++IP +L+ AI GIG F+AY+G KNA + FSI +V K A 

Sbjct: 121 SIIKAIPVSLQHAIGGGIGVFVAYLGFKNANIITFSISAENIVMVNGVEPAKASAKTFM 180 

Query: 172 GLATITANSSATPGLVSFDNPAILLSLIGLSITIFFIVKGIRGGIILSILTTTLLGILMG 231 

GL + AN P + SF + +LL++ GL +T +++ RG I++ 1+ TTL+GI +G 
Sbjct: 181 GLLFVDANGGWPTISSFTDSGVLIAIFGLLLTTALVIRNFRGAILIGIVATTLVGIPDG 240 

Query: 232 WKLDAINWEATNLSASFRDLKQVFGVALGEKGLISLFSNPSRLPSVLMAILAFSLTDIF 291 

+V + +N+ ++4 ++ +L FA GL SLFS+ SRLP V M I AFSL+D F 
Sbjct: 241 IVDVSNLNFGISHIGEAWTELGTTFLAAF--DGLSSLFSDSSRLPLVFMTIFAFSLSDTF 298 

Query: 292 DTIGTLIGTGEKVGILATTGDN HESKSLDKALYSDLIGTTFGAICGTSKVTTYV 345 

DTIGT IGTG + GI + +N S +D+AL++D IGT+ GA+ GTSN TTYV 

Sbjct: 299 DTIGTFIGTGRRTGIFSQDDENALENSIGFSSKMDRALFADAIGTSIGALVGTSNTTTYV 358 

Query: 346 ESAAGIGAGGRTGLTALWAGLFAISSFFSPLVSIVPSQATAPILVIVGIMMLSNLKDIK 405 

ESAAGI GGRTGLTA+ A F +S PLV IVP+ ATAP L+IVG+MM+S+ D+ 
Sbjct: 359 ESAAGIAEGGRTGLTAVSTAVCFLLSILLtiPLVGIVPAAATAPALIIVGVMMVSSFLDVN 418 

Query: 406 WDDMSEAIPAFFTSLFMGFTYSITYGIAAGFIiTYTtAKVIKGQAKDIHWLWILDILFIIi 465 

W ++A+PAFF + FM YSI+YGIAA F+ Y L KV++G+ KDIH ++W LFI+ 
Sbjct: 419 WSKFADALPAFFAAFFMALCYS I SYGIAAAFI FYCLVKWEGKTKDI HPI IWGATFLFI V 478 

Query: 466 NFISLAIL 473 

NFI L IL 
Sbjct: 479 NFIILTIB 486 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1111 

A DNA sequence (GBSxll86) was identified in S.agalactiae <SEQ ID 3435> which encodes the amino 
acid sequence <SEQ ID 343 6>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3221 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 27 MFYTQNEEELIALGQKLGTVLKSGDIVLLTC5NL3AGKTTLTKGIAKGLD1KQM1KSPTYT 86 

M TQ+ E +A QKL h +GD4+ h G+LGAGKT+ TKG+A GL IK+++KSPT+T 
Sbjct: 5 miTQSPEATMAFAQKIJU3KLLAGDVITLEGDLGAGKTSFTKGLALGLGIKRWKSPTFT S4 

20 Query: 87 I vREYEGRVPLYHLrmRIGDDPDSIDLDDFLFGQGVTVlEWGELLSDNLINNYLEIVIT 146 

I+REY+GR+PLYH+DVYR+ ++ + + D++ G GVTV+EW L+ L L I IT 
. Sbjct: 65 IIREYKGRLPLYHMDvYRLNEEEEDLGFDEYFHGDGVTWEWASLIEGRLPPVRLAITIT 124 

Query: 147 RSNQG- RQVQLEAYGHRAREI IEAIQD 172 
25 + + RQ+ AYG R E4-++ + D 

Sbjct: 125 HAGENERQLSFTAYGERWEEVLKELLD 151 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3437> which encodes the amino acid 
sequence <SEQ ID 3438>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 1202 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 97/142 (68%) , Positives = 122/142 (85%) 

Query: 27 MFYTQNEEEL I JALGQKLGTVLKSGDI VLLTGNLGAGKTTLTKGI AKGLDI KQMI KS PTYT 86 

MFY++NE L A G+ LGT L GD+++L+G+LGAGKTTL KGIAKG+ I QMIKSPTYT 
Sbjct: 1 MFYSENEYTLKAYGETLGTYLSIGDVIVLSGDLGAGKTTLAKGIAKGMGISQMIKSPTYT 60 

Query: 87 IVREYEGRVPLYHLDVYRIGDDPDSIDLDDFLFGQGVTVIEWGELLSDNLINNYLEIVIT 146 

IVREYEGR+PLYHLD+YR+GDDPDSIDLDDFLFG GVTVIEWGELL + L+ +YL+I IT 
Sbjct: 61 IVREYEGRLPLYHLDIYRVGDDPDSIDLDDFLFGNGVTVIEWGEIjLGEGLLQDYLQITIT 120 

Query: 147 RSNQGRQVQLEAYGHRAREIIE 168 

+ +4-GRQ+ L A+G R+R+++E 
Sbjct: 121 KRDKGRQLDLLAHGERSRQLLE 142 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



55 Example 1112 



A DNA sequence (GBSxll87) was identified in S.agalactiae <SEQ ID 3439> which encodes the amino 
acid sequence <SEQ ID 3440>. Analysis of this protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



-1243- 

Possible site: 5B 

»> Seems to have no N-terminal signal sequence 

Final Results 

5 bacterial cytoplasm Certainty=0.17S2 (Affirmative) < suco 

bacterial membrane Certainty=O.OOO0 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:AAD35662 GB : AE001732 conserved hypothetical protein [Thermotoga maritima] 

Identities = 56/163 (34%) , Positives = 94/163 (57%) , Gaps = 1/163 (0%) 



Query: 


24 


E1ASREELASAILEFLNTVTEETDFILHTVSNQLSLS3METFIENTLMTKNCICLIAKLKNK 83 






EAS +A I+E+L VT ETDF++ +S +1 + ++ ++ + 


Sbjct: 


18 


EASIWDARRIVEYLKEVTSETDFLITRPDEVyDVSTERNYIRMYRSNPGKLMIVGEINRE 77 




84 


VIGLITI ISQSDIEIEHVGDLFIAVQKDYWGYGIGHILMEEAIEWASDNDITRRLELSVQ 143 






++ L+T +HVG++ I+V+K YW GIG ++ AIEWA N R++L V 


Sbjct: 


78 


IVSLLTFTGFGRKRTKHVGEIGISVKKRYWNIGIGTRMITSAIEWARRNGFI -RIQLEVL 13 6 






GRNERAIHLYQKFGFEIDGLQTRGIKRENGEFLDIYRMSKLID 186 






NERAI LY+K GFE++G++ + ++R++G F D+ M+ L+D 


Sbjct: 


137 


KSNERAISLYRKLGFELEGIKRKAVRRDDGSFEDVI1VMAI1I1I1D 179 



25 There is also homology to SEQ ID 1724. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1113 

A DNA sequence (GBSxll88) was identified in S.agalactiae <SEQ ID 3441> which encodes the amino 
30 acid sequence <SEQ ID 3442>. Analysis of this protein sequence reveals the following: 
Possible site: 53 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

35 bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

40 >GP:CAB15582 GB:Z99122 membrane -bound protein [Bacillus subtilis] 

Identities = 108/324 (33%) , Positives = 178/324 (54%) , Gaps = 33/324 (10%) 





5 


KKITLMFSAIILTTVIALGV--YVASAYNFSTNELSKTFKDFKLAKS--KSHAIEETKPF 


60 






KK TL+ + + + ++ LG Y ++ + + ++ + +K K +1 + PF 




Sbjct: 


8 


KKKTLLLTILTIIGLLVLGTGGYAYYLWHKAASTVASIHESIDKSKKRDKEVSINKKDPF 


67 




61 


SILLMGVDTGSEHRKSKWSGNSDSMILVTINPKTNKTTMTSLERDVLIKLSGPIQJNGQTG 








S+L+MGVD + G +D++I +T+NPKTN T M S+ RD K+ G G 




Sbjct: 


68 


SVLIMGVDERDGDK GRADTLI YMTVNPKTNTTDMVSI PRDTYTKI IGK G 






121 


VEAKLNAAYASGGAEMALMTVQDLLDINVOT 


180 






K+N +YA GG +M + TV++ LD+ VDYF+++NM+ D+V+ +GGITV + F F 




Sbjct: 


117 


TMDKINHSYAFGGTQMTVDTVENFLDVPVDYFVKVNMESFRDVVDTLGGITVNSTFAFSY 


176 




181 


SIAANEPEYKAVVEPGTHKINGEQALVYSRMRYDDPEGDYGRQKRQREVIQKVLKKILAL 


240 






+ G +NG++AL Y+RMR +DP GD+GRQ RQR+VIQ ++ K + 




Sbjct: 


177 


DGYS FGKGEITLNGKEALAYTRMRKEDPRGDFGRQDRQRQVIQGIINKGANI 


228 


Query: 


241 


NSISSYKKILSAVSNNMQTNIEISSKTIPNIi LAYKDSLEHI KSYQLKGEDATLSDG 


296 
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+SI+ + + V NN++TN+ T N+ YK + +HIK ++LKG T +G 

Sbjct: 229 SSITKFGDMPKWENNVKTNL TPDNMWDIQSDYKGAEKHIKQHELKG-TGTKING 282 

Query: 297 GS YQI LTKKHLLAVQNRI KKELDK 320 
5 y + L + +K+ L+K 

Sbjct: 283 IYYYQADESALSDITKELKESLEK 306 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2763> which encodes the amino acid 
sequence <SEQ ID 2764>. Analysis of this protein sequence reveals the following: 

10 Possible site: 33 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 288/436 (66%) , Positives = 342/436 (78%) , Gaps = 22/436 (5%) 

Query: 1 MKIWKKITLMFSAIILTWIALGVWASAYNFST^LSKTPKDFKLAKSKSHAIEETKPF 60 

MKI KKI LMF+AI +LTTV +ALGVY+ SAY FST ELSKTFKDF + +KS AI ++T+ F 
Sbjct: 1 MKIGKKI\7LMFTAIVLTTVIiALGVYLTSAYTFSTGELSKTFKDFSTSSNKSDAIKQTRAF 60 

Query: 61 S I LLMGVDTGSEHRKS KWSGNSDSMI L VT INPKTNKTTMTSLERDVLIKLSGPKNNGQTG 120 

SIIiLMGVDTGS R SKW GNSDSMILVT+NPKT KTTMTSLERD L LSGPKNN G 
Sbjct: 61 SILLMGVDTGSSERASKWEGNSDSMILVTVNPKTKKTTMTSLERDTLTTLSGPKNNEMNG 120 



Query: 121 VEAKLNAAYASGGSAEMALMWQELLDINVDYFMQI^QGLVDLWAVGGITVTNKFDFPI 180 

VEAKLNAAYA+GGA+MA+MTVQDLL+I +D ++QINMQGI.+DLVNAVGGITVTN+FDFPI 
Sbjct: 121 VEAKLNAAYAAGGAQMAIMTVQDLIJSIITIDNWQINMQGL1DLVNAVGGITVTNEFDFPI 180 

Query: 181 SIAMffiPEYKAVVEPGTHKIM^CJU,VYSRmYDDPEGDYGRQKRQREVIQKVLKKILftL 240 

SIA NEPEY+A V PGTHKINGEQALVY+RMRYDDPEGDYGRQKRQREVIQKV1KKILAL 
Sbjct: 181 SIAENEPEYQAWAPGTHKlNGEQALVYARMRYDDPEGDYGRQKRQREVIQKVIjKKIIiAL 240 

Query: 241 NSISSYKKILSAVSNNMQTNIEISSKTIPN-jIAYKDSLEHIKSYQLKGEDATLSDGGSYQ 300 

+SISSY+KILSAVS+NMQTNIEISS+TIP+LI, Y+D+L 1K+YQLKGEDATLSDGGSYQ 
Sbjct: 241 DSISSYRKILSAVSSNMQTNIEISSRTIPSLLGYRDALRTIKTYQLKGEDATLSDGGSYQ 300 

Query: 301 ILTKKHLIAVQNRIKKELDKKRSKTLKTSAILYEDYYGTTASNDSSTYSSTQENNYNTT- 359 

I+T HLL +QNRI+ EL + LKT+A +YE+ YG ST S T NNY+++ 

Sbjct: 301 IVTSNHLLE I QNRIRTELGLHKVNQLKTNATVYENLYG STICSQTVNNNYDSSG 353 



Query: 412 NNHNGAATPNPNTGTQ 427 

G+ P N Q 
Sbjct: 412 GSLVPPANINPQ 423 

SEQ ID 3442 (GBS54) was expressed in E.coli as a IBs-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 16 (lane 8; MW 48.4kDa). 

The GBS54-His fusion product was purified (Figure 98A; see also Figure 194, lane 6) and used to immunise 
mice (lane 1+2 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 98B), 
FACS (Figure 98C), and in the in vivo passive protection assay (Table III). These tests confirm that the 
protein is immunoaccessible on GBS and that it is an effective protective immunogen. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1114 

A DNA sequence (GBSxll89) was identified in S.agalactiae <SEQ ID 3443> which encodes the amino 
acid sequence <SEQ ID 3444>. This protein is predicted to be Vesl-IL. Analysis of this protein sequence 
reveals the following: 

Possible site: 18 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.44 Transmembrane 3 - 19 ( 3 - 19) 



Final Results 

bacterial membrane Certainty=0 . 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3445> which encodes the amino acid 
sequence <SEQ ID 3446>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane — Certainty=0.0000(Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 42/98 (42%) , Positives = 64/98 (64%) 

Query: 1 MKIGRLIALGLVSLGALELYKNRKTIKDSYONTKNETDSAKLKLERIKNDLAIISQEKEK 60 

MK+ +IA+GL+S A + Y+ R TIK+ ++ D+A+L L+ IK +L +1 + + 
Sbjct: 1 MKVTCTVIAVGLLSFTAYKAYQKRCTIKELLSISRQAKDAAQLDLDNIKANLDLIHSQGKV 60 

Query: 61 IRLISQELNHKFQVFNKDIQPRLEEINQRMAKYQEKDE 98 

1+ ISQ+L HK++ FN++ Q L EI RMAKYQE E 
Sbjct: 61 IQNISQDLAHKWRYFNQETQAHLTEIQNRMAfCYQEDSE 98 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1115 

A DNA sequence (GBSxll90) was identified in S.agalactiae <SEQ ID 3447> which encodes the amino 
acid sequence <SEQ ID 3448>. This protein is predicted to be Hit-like protein involved in cell-cycle 
regulation (hit). Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 2694 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04908 GB:AP001511 Hit-like protein involved in cell-cycle 
regulation [Bacillus halodurans] 



Ident: 


itiei 


3 = 74/137 (54%) , Positives = 95/137 (69%) , Gaps = 2/137 (1%) 






3 


NCIFCKIISGEIPSSKVyEDDEVIAFLDITQTTTGHTLLIPKKHVRMVLEMDEKTAQITF 


62 






NCIFCKII+GEIPS+ VYEDD V AFLDI+Q T GHTL+IPK H RNV E+ E+ A F 




Sbjct: 


6 


NCIFCKI IAGEIPSATVYEDDHVYAFLDISQVTKGHTLVIPKVHKRNVFELSEEIASSIiF 


65 




63 


ERLPWARAVQAATKAKGMNIII^MNEEIAGQTVFHAHVHLVPRFDESDGIKIHyTTHEPD 


122 






+PK++RA+ A + GMNI+NNN E AGQTVFH H+HL+PR+ E DG + H 




Sbjct: 




AAVPKISRAINDAFQPIGMNIVNNNGEAAGQTVFHYHLHLLPRYGEGDGYGAVWKDHSSQ 125 


Query: 


123 


F--EALAKLAKEIRKEI 137 








+ + L L+ IR+ + 




Sbjct: 


126 


YSGDDLQVLSSS IREHL 142 





A related DNA sequence was identified in S.pyogenes <SEQ ID 3449> which encodes the amino acid 
sequence <SEQ ID 3450>. Analysis of this protein sequence reveals the following: 

20 Possible site: 37 

Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0125 (Affirmative) < suco 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 97/137 (70%) , Positives = 117/137 (84%) 

30 

Query: 1 MDNCI FCKI 1 SGKI PSSKVYEDDEVLAFLDITQTTTGHTLLIPOCHVRNVIiEMDEKTAQI 60 

M+NCIFC II G+IPSSKVYED++VLAFLDI+QTT GHTL+IPK+HVRN+LEM +TA 
Sbjct: 1 MENCI FCS 1 1 QGDI PSSKVYEDEQVIAFLD I SQTTKGHTLVI PKQHVRNLLEMTAETASH 60 

35 Query: 61 TFERLPKVARAVQAaTKAKG^IINNNEEIAGQTVFHAHVHLVPRFDESDGIKIHYTTHE 120 

F R+PK+ARA+Q+AT A MNIINNNE +AGQTVFHAHVHLVPR++E DGI I YTTHE 
Sbjct: 61 LFARIPKIARAIQSATGATA^IINNNEALAGQTVFHAHVHLVPRYNEEDGISIQYTTHE 120 

Query: 121 PDFEALAKLAKEIRKEI 137 
40 PDF L KLA++I +E+ 

Sbjct: 121 PDFPVLEKLARQINQEV 137 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 1116 

A DNA sequence (GBSxll91) was identified in S.agalactiae <SEQ ID 3451> which encodes the amino 
acid sequence <SEQ ID 3452>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have a cleavable N-term signal seq. 

50 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

55 

A related GBS nucleic acid sequence <SEQ ID 10923> which encodes amino acid sequence <SEQ ID 
10924> was also identified. 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3452 (GBS87) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 8 (lane 3; MW 19.5kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 15 (lane 10; MW 44kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1117 

A DNA sequence (GBSxll92) was identified in S.agalactiae <SEQ ID 3453> which encodes the amino 
acid sequence <SEQ ID 3454>. This protein is predicted to be ABC transporter, ATP-binding protein. 
Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -6.53 



Final Results 

bacterial membrane Certainty=0 .3612 (Affirmative) < suco 

bacterial outside Certainty=0.O000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9563> which encodes amino acid sequence <SEQ ID 9564> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12844 GB:Z99109 ABC transporter (ATP-binding protein) 
[Bacillus subtilis] 
Identities = 137/242 (56%) , Positives = 181/242 (74%) 

MTMLKIENVTGGYVNI PVLKNI S FEVNDGELVGLIGLNGAGKSTTINE I IGI LRPYQGD I 6 0 
M++ L ++++TGGY PVLKN+SF + ++VGLIGLNGAGKSTTI IIG++ P++G I 
MSLLSVKDLTGGYTRNPVLKNVSFTLEPNQIVGLIGLNGAGKSTTIRHIIGLMDPHKGSI 60 

TIDGISLEADQELYRKKIGFIPETPSLYESLTLREHLEWAI/IAYDIATDEVMARAQKLLE 120 
+4G + D E YR + +IPETP LYEELTL EHLE AMAY ++ + + R LL+ 



K G S+LMSTH+L +AE+ CD F4-ILH GE+RA GTL ELR FG +A L+D+Y+ LT 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3455> which encodes the amino acid 
sequence <SEQ ID 3456>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

:erminal signal sequence 

L - 157 { 139 - 158) 





1 


Sbjct: 


1 




61 


Sbjct: 


61 


Query: 




Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 
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Final Results 

bacterial membrane Certainty=0. 3017 (Affirmative) . 

bacterial outside — Certainty=0.0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 

The protein has homology with the following sequences in the databases: 

>GP:CAB12844 GB:Z99109 ABC transporter (ATP-binding protein) 
[Bacillus subtilis] 
Identities = 139/241 (57%) , Positives = 189/241 (77%) 



+L++K+LTGGY PVL +VSF+++ ++VGLIGLNGAGKSTTI IIG + P++GSI 1 







1 


15 


Sbjct: 


3 




Query: 


61 




Sbjct: 




20 


Query: 


121 






123 


25 




181 




Sbjct: 


183 - 




An alignment o: 


30 


Ident: 








3 i 

: 




Sbjct: 


l 


35 




63 




Sbjct: 


61 


40 




123 




Sbjct: 


121 




Query: 


183 


45 


Sbjct: 


181 



R+ +L+WFP +FSKGMKQKVMI+CAF+ 4P+L+I+DEPFLGLDPLAI+ L+4 + 



G S+LMSTH+L +AER CD F+ILH+G+VRA+GTL++L+E FG + A+L+D+YL LTKED 



ML I+N+TGGY NIPVL ++SF V++GELVGLIGLNGAGKSTTINEI IG L+PYQG 1 + 1 
MLNIKI^TGGYHNIPVljOTVSFSVIJNGELTCLIGIiN^ 60 

DGISLEADQELYRKKIGFIPETPSLYEELTLREHLETVAMAYDIATDEVMARAQKLLEMF 122 
DG++L + YR+KIGFIPETPSLYEELTL EH+ TVAMAYDI + RAQ LEMF 
DGLTLAENAVAYRQKIGFIPETPSLYEELTLSEHINTVAMAYDIDLEVAQKRAQPFLEMF 120 

RLTDKLDWFPMHFSKGMKQKOTIIICAFWSPSLFIVDEPFLGLDPLAISDIjINLLAEEKA 182 
RLTDKL+WFP++FSKGMKQKVMI I CAFV+ PSLFI+DEPFLGLDPLAISDLI L EKA 
RLTDKLEWFPVNFSKGMKQKVMIICAFVIDPSLFILDEPFLGLDPLAISDLIQTLEVEKA 180 

KGKSILMSTHVIjDSAEKMCDRFVILHKGEIEAVGTLEELPAIFGDSNANLNDIYIALTKE 242 
KGKSILMSTHVLDSAE+MCDRFVILH G++RA GTL +L+ FGD +A+LNDIY+ALTKE 
KGKSILMSTHVbDSAEFJ^CDRFVILHHGQVRAQGTbADLQEAFGDRSASIjNDIYIiALTKE 240 

SEQ ID 3454 (GBS353) was expressed in E.coti as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 2; MW 30kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 6; MW 55kDa). 

50 GBS353-GST was purified as shown in Figure 216, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1118 

A DNA sequence (GBSxl 193) was identified in S.agalactiae <SEQ ID 3457> which encodes the amino 
55 acid sequence <SEQ ID 345 8>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 1475 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

5 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
10 vaccines or diagnostics. 

Example 1119 

A DNA sequence (GBSxll94) was identified in S.agalactiae <SEQ ID 3459> which encodes the amino 
acid sequence <SEQ ID 3460>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



I-terminal signal sequence 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



• Certainty=0. 6074 (Affirmative) ■= 

• Certainty=0 . 0000 (Not Clear) < e 

• Certainty-0. 0000 (Not Clear) < e 



30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12845 GB:Z99109 ABC transporter (membrane protein) [Bacillus subtilis] 
Identities = 101/409 (24%), Positives = 187/409 (45%), Gaps = 76/409 (18%) 

Query: 1 MKKLFNKRRSLFLTQNSKYLRYVFNDHFVLVLMFLSGFLLYQYSQLLKDFPKTHWPIIVI 60 
35 M ++ R + + Y++Y+ NDH V+VL+F YS+ ++D P H+P + 

Sbjct: 4 MLDIWQSRLQEHIKETRTYMKYMLNDHLVIVLIFFLAGAASWYSKWIRDIP-AHFPSFWV 62 

Query: 61 VSIIILMLLAMGGIASYLEPADKQFLLIKEEAIKEIINSAKKRTYI 106 

++++ ++L + + L+ AD FLL E ++ + A +Y+ 
40 Sbjct: 63 MAVLFSLVLTSSYVRTLLKEADLVFLLPLEAICMEPYLKQAFVYSYVSQLFPLIALSIVAM 122 

Query: 107 --FWLVIQTLFLVLISPILIKLGL 128 

++ V LV + + ++L L 

Sbjct: 123 PLYFAVTPGASLVSYAAVFVQLLLLKAKNQVMEWRTTFQNDRSMKRMDVIIRFAANTLVL 182 

45 

Query: 129 SVFMITLLIFGLGIIKWLVITYKVKVFYNNQNLNWDAAINHEQERKQSILKFFSL 183 

SV+M LL++ + + +L ++ K + W++ I E RKQ + +L 

Sbjct: 183 YFVFQSVYMYALLVYVIMAVLYLYMSSAAK RKTFKWESHIESELRRKQRFYRIANL 238 

50 Query: 184 FTNWGISTSVKRRSFLDGILKLISKTPSRLWINLFVRAFLRSSDYLGLTIRLVTLNILS 243 

FT+V + KRR++LD +L+L+ + + +F RAFLRSSDYLG+ +RL + L 

Sbjct: 239 FTDVPHLRKQAKRRAYLDFLLRLVPFEQRKTFAYMFTRAFLRSSDYLGILVRLTIVFALI 298 

Query: 244 VIFVNETYLAIALAFVFN-YLLLFQLIiAI/SHHFDYQYMNQLYPTOIjNAKASQLKGFLRVL 3 02 
55 +++V+ + L A+ VF ++ QLL L HFD+ + +LYPV+ K ++LK + +L 

Sbjct: 299 IMWSASPLIAAVLTVFAIFITGIQLLPLFGHFDHLALQELYPVQ KETKLKSYFSLL 355 



Query: 3 03 SYAVTVIDSI LIRELKPVILLIVLMLI VTEYYI PYKIKK 341 

A+++ + L L +1 VL+ +V Y+ ++KK 
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Sbjct: 355 KTALSIQALLMSVASAYAAGLTGFLYALIGSAVLI7WLPAYMTTRLKK 404 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3461> which encodes the amino acid 
sequence <SEQ ID 3462>. Analysis of this protein sequence reveals the following: 



Possible site: 44 





have no N-terminal signal sequence 










INTEGRAL 


Likelihood =-14.91 


Transmembrane 


126 


142 


119 


151 


INTEGRAL 


Likelihood = -9.77 


Transmembrane 


320 


33S 


311 


339 


INTEGRAL 


Likelihood = -6.37 


Transmembrane 


59 


75 


53 


79 


INTEGRAL 


Likelihood = -4.94 


Transmembrane 


28 


44 


22 


47) 


INTEGRAL 


Likelihood = -4.73 


Transmembrane 


250 


266 


249 


273) 


INTEGRAL 


Likelihood = -4. 04 


Transmembrane 


231 




229 


243 


INTEGRAL 


Likelihood = -3.19 


Transmembrane 


298 


314 


295 


315 


INTEGRAL 


Likelihood = -2.28 


Transmembrane 


103 


119 


103 


119 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■ Certainty=0. 6965 (Affirmative) . 

- Certainty=0. 0000 (Not Clear). < i 

- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 

>GP:CAB12845 GB:Z99109 ABC transporter (membrane protein) [Bacillus subtilis] 
Identities = 96/403 (23%), Positives = 173/403 (42%), Gaps = 78/403 (19%) 

Query: 1 MKALFLKRRQDFQKQQNKYLRYVLNDHFVLVLMFLLGFAMVQYGQLLN HFPT 52 

M ++ R Q+ K+ Y++Y+LNDH V+VL+F LA Y + + HFP+ 
Sbjct: 4 MLDIWQSRLQEHIKETRTYMKYMLITOHLVIVLIFFLAGAASWYSKWIRDIPAHFPSFWVM 63 

Query: 53 - -NHLPIQVCLGILIPLLLSM 71 

L + L L+PL M 

Sbjct: 64 AvLFSLVLTSSYVRTLLKEADLVFLLPLEAKMEPYLKQAFVYSYVSQLFPLIALSIVAMP 123 

Query: 72 GSIATYLEEADQHFLLPKEEEVISYI KQAERLSFLLWGTLQTAVLL 117 

S+ +Y Q LL 4V+ + + +R+ ++ T VL 

Sbjct: 124 LYFAVTPGASLVSYAAVFVQLLLLKAWNQVMEtTOTTFQNDRSMKRMDVIIRFAaNTLVLY 183 

Query: 118 FLYPIFRRLGLSLFIFIILVLILLALKRWLSRKTRYFLRGNRLDWAKAVAFESNRKQSI 177 

F4+ S++++ +LV 4-++A+ + +S + W + E RKQ 

Sbjct: 184 FVFQ SVYMYALLVYVIMAVLYLYMSSAAKR KTFKWESHIESELRRKQRF 232 

Query: 178 LKFYSLFTTVKGISTKVKERTYLNPLLKLVKQTPSNLWLSLYARAFLRSSDYLGLFLRLM 237 

+ +LFT V + + K R YL+ LL+LV + ++ RAFLRSSDYLG+ +RL 

Sbjct: 233 YRIANLFTDVPHLRKQAKRRAYLDFLLRLVPFEQRKTFAYMFTRAFLRSSDYLGILVRLT 292 

Query: 238 LLSSLSVFFIHNLYLSVSLALIFN-YLWFQLLSLYYHYDYHYMTSLYPENSRSKKKNML 296 

++ +L + ++ L ++ +F ++ QLL L+ H+D+ + LYP +K K+ 
Sbjct: 293 IVFALIIMYVSASPLIAAVLTVFAIFITGIQLLPLFGHFDHIALQELYPVQKETKLKSYF 352 

Query: 297 SFLR-GLSFLMLIVNMLCCSSAPKA--LILIVGMVFIACIYLP 336 

S L+ LS L++++ +A L ++G + + LP 

Sbjct: 353 S LLKTAL S I QALLMS VAS AYAAGLTG FLYAL I GSAVL I FWLP 395 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 170/344 (49%) , Positives = 237/344 <68%) 



Sbjct: 



MKKLFNKRRSLFLTQNSKYLRYVFNDHFVL^/LMFLSGFLLYQYSQLLKDFPKTHWPIIVI 60 
MK LF KRR F Q +KYLRYV NDHFVLVLMFL GF + QY QLL FP H PI V 
MKALFLKFJIQDFQKQQNKYLRYVLOTDHFVLVLMFLLGFAMVQYGQLLNHFPTNHLPIQVC 6 0 



Query: 61 VSI I ILMLLAMGGIASYLEPADKQFLLI KEEAI KEI INSAICKRTYI FWLVIQTLFL VLIS 120 

+ I+I +LL+MG IA+YLE AD+ FLL KEE + I A++ +++ W +QT L+ + 

Sbjct: 61 LGILIPLLLSMGSIATYLEEADQHFLLPKEEEVISYIKQAERLSFLLWGTLQTAVLLFLY 120 

Query: 121 PILIKLGLSVFMITLLIFGLGI1KWLVITYKVKVBYNNQNLNWDAAINHEQERKQSILKF 180 
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Query: 181 PSLFTNVKGISTSVKRRSFLDt3ILKLISKTPSRLWTNLFVRAFLRSSDYLGLTIRLVTIiN 240 

+SLFT VKGIST VK R++L+ +LKL+ +TPS LW +L+ RAFLRSSDYLGL +RL+ L+ 
Sbjct: 181 YSLFTTVKGISTKVKERTYLNPLLKLVKQTPSNLWLSLYARAFLRSSDYLGLFLRLMLLS 240 

Query: 241 ILSVIFA71ffiTYIlMlAIlAFVFNYLLLFQLliALGHHFDYQY^lNQLYPWIJNAKASQLKGFLR 300 

LSV F++ YL+++LA +FNYL++FQLL+L +H+DY YM LYP +K + FLR 
Sbjct: 241 SLSVFFIHNLYLSVSLALIFNYLWFQLLSLYYHYDYHYMTSLYPENSRSKKKNMLSFLR 300 



Query: 301 VLSYAVTVIDSILIRELKPVILLIVLMLIVTEYYIPYKIKKMID 344 

LS+ + +++ + ++LIV M+ + Y+PYK+KK+ID 

Sbjct: 301 GLSFLMLIVNMLCCSSAPKALILIVGMVFIACIYLPYKLKKIID 344 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1120 

A DNA sequence (GBSxll95) was identified in S.agalactiae <SEQ ID 3463> which encodes the a 
acid sequence <SEQ ID 3464>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N- terminal signal 



- Final Results 

bacterial cytoplasm Ccrtainty=0 .2821 (Affirmative) 

bacterial membrane Certainty=0. 0000 (Not Clear) < i 

bacterial outside Certainty=0 .0000 (Not Clear) < ; 



The protein has homology with the following sequences in the GENPEPT database. 

30 >GP:AAC00284 GB:AF008220 YtmP [Bacillus subtilis] 

Identities = 69/214 (32%) , Positives = 121/214 (56%) , Gaps = 1/214 (0%) 





12 


PLRGKSGKAYIGTYPNGERVFVKYNTTPILPALAICEQIAPQLLWARRTSNGDMMSAQEWL 


71 






P G +G AY + NG+++F+K N++P L L+ E I P+L+W +R NGD+++AQ W+ 




Sbjct: 


20 


PAGGATGDAYYAKH-NGQQLFLKl^SSPFLAvLSAEGIVPKLVWTKEMENGDVITAQHWM 


78 




72 


DGRTLTKEDMGSKQIIHILLRLHKSRPLVNQLLQLGYKIENPYDLLMDWEKQTPIQIREN 


131 






GR L +DM + + +L ++H S+ L++ h +LG t BP LI 44 + + 




Sbjct: 


79 


TGRELKPKDMSGRPVAELLRKIHTSKALLDMLKRLGKEPLNPGALLSQLKQAVFAVQQSS 


138 


Query: 


132 


TYLQSIVTELKRSLPEFRTEVATIVHGDIKHSNWVTTTSGLIYLVDWDSVRLTDRMYDVA 


191 






+Q + L+ L E + H D+ H+NW+++ +YL+DWD + D D+ 




Sbjct: 


139 


PLIQTCIKYLEEHLHEVHFGEKWCHCDVNHl^LLSEDNQLYLIDWDGAMIADPAMDLG 


198 




192 


YILSHYIPQKHWKDWLSYYGYKDNEKVWSKIIWY 225 








+L HY+ + W+ WLS YG + E + ++ WY 




Sbjct: 




PLLYHYVEKPAWESWLSMYGIELTESLRLRMAVrY 232 





A related DNA sequence was identified in S.pyogenes <SEQ ID 3465> which encodes the amino acid 
50 sequence <SEQ ID 3466>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N- terminal signal sequence 



Final Results 

55 bacterial cytoplasm Certainty=0.268S (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 214/262 (81%), Positives = 242/262 (91%) 



Query: 1 MTISNQELTLTPLRGKSGKAYIGTYPNGERVFVKYNTTPILPALflKEQIAPQLLWARRTS 6 0 
+T + QELTLTPLRGKSGKAY GTYPNGE VF+K NTTPILPALAKEQIAPQLLWA+R 
5 Sbjct: 1 VTTTEQELTLTPLRGKSGKAYKGTYPNGECVFIKLNTTPILPAIAKEQIAPQLLWAKRMG 60 



Query: 


61 


NGDMMSAQEWLDGRTLTKEDMGSKQIIHILLRLHKSRPLVNQLLQLGYKIENPYDLLMDW 


120 






NGDMMSAQEWL+GRTLTKEDM SKQI IHILLRI1HKS+ LVNQLLQL YKIENPYDLL+D+ 




Sbjct: 


61 


NGDMSAQEWI^GRTLTKEDMNSKQIIHIiLRLHKSKKLVNQLLQIiNYKIENPYDLLVDF 


120 




121 


EKQTPIQIRENTYLQSIVTELKRSLPEFRTEVATIi/HGDIKHSNWVITTSGLIYLVDWDS 


180 






E+ P+QI++N+YLQ+IV ELKRSLPEF++EVATIVHGDIKHSNWVITTSG+I+LVDWDS 




Sbjct: 


121 


EQNAPLQIQQNSYLQAIVKELKRSLPEFKSEVATIWGDIJCHSNWVITTSGMIFLVDWDS 


180 




181 


VRLTDRMYDVAYILSHYIPQKHWKDWLSYYGYKDMEKVWSKIIWYGQFSYLSQIIKCFDK 


240 






VRLTDRMYDVAY+LSHYIP+ W +WLSYYGYK+N+KV KIIWYGQFS+L+QI+KCFDK 




Sbjct: 


181 


VRLTDRMYDVAYLLSHYI PRSRWSEWLSYYGYKNNDKVMQKI IWYGQFSHLTQILKCFDK 


240 




241 


RDMEHVNQEIYELRKFRELIKK 262 








RDMEHVHQEIY LRKFRE+ +K 




Sbjct: 


241 


RDMEHVNQE I YALRKFRE I FRK 262 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1121 

A DNA sequence (GBSxl 196) was identified in S.agalactiae <SEQ ID 3467> which encodes the a 
acid sequence <SEQ ID 3468>. Analysis of this protein sequence reveals the following: 
Possible site: 51 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4529 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Cer taint y= 0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



MRVRKRKGAEEHLENNPHYVI SNPEEAKGRWKE I FGMNNP I H IEVGSGKGAFITGMAEQN 6 0 
MR+R + A++ L N ISNP + KG+W+ +FGN+NPIHIEVG+GKG FI+GMA+QN 
MRMRHKPWADDFIAENADIAISNPADYKBKMNTVFGNDNPIHIEVGTGKGQFISGiyiAKQN 6 0 



Query: 


61 


PDINYIGIDIQLSVLSYALDKVLDSGAKNIiajLLVDGSSLSNYFDTGEVDLMYLNFSDPW 


120 






PDINYIGI++ SV+ A+ KV DS A+N+KLL +D +L++ F+ GEV +YLNFSDPW 




Sbjct: 


61 


PDINYIGIELFKSVIVTAVQKVKDSEAQKr^tNIDADTLTDVFEPGEVKRVYLNFSDPW 


120 




121 


PKKKHEKRRLTYKTFLDTYKDILPEQC-EIHFKTDNRGLFEYSIASFSQYGMTLKQVWLDL 


180 






PKK+HEKRRLTY FL Y++++ + G IHFKTDNRGLFEYSL SFS+YG+ L V LDL 




Sbjct: 


121 


PKKRHEKRRLTYSHFLKKYEEVMGKGC-SIHFKTDNRGLFEYSLKSFSEYGLLLTYVSLDL 


180 




181 


HASDYQQNIMTEYERKFSNKGQVIYRVEARF 211 








H S+ + NIMTEYE KFS GQ IYR E + 




Sbjct: 


181 


HNSNLEGNIMTEYEEKFSALGQPIYRAEVEW 211 





55 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3469> which encodes the amino acid 
sequence <SEQ ID 3470>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have no N-terminal signal sequence 

60 
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Final Results 

bacterial cytoplasm Certainty=0 .3303 (Affirmative) ■ 

bacterial membrane Certainty=0. 0000 (Not Clear) < i 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 179/211 (84%), Positives = 193/211 (90%) 



Sbjct: 



MRVRKRKGAEEHL NNPHYVI NPE+AKC 



121 PKKKHEKRRLTYKTFLDTYKDILPEQGEIHFKTDNRGLFEYSLASFSQYGMTLKQVWLDL 180 

PK KHEKRRLTYK FLDTYK ILPE GEIHFKTDNRGLFEYSLASFSQYGMTL+Q+WLDL 
121 PKTKHEKRRLTYKDFLDTYKRILPEHGEIHFKTDNRGLFEYSIiASFSQYGMTLRQIWLDL 180 

181 HASDYQQNIMTEYERKFSNKGQVIYRVEARF 211 

HAS+Y+ N+MTEYE KFSNKGQVIYRVEA F 
181 HASNYEGNVMTEYEEKFSNKGQVIYRVEANF 211 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1122 

A DNA sequence (GBSxll97) was identified in S.agalactiae <SEQ ID 3471> which encodes the amino 
acid sequence <SEQ ID 3472>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1311 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06136 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities = 61/124 (49%) , Positives = 81/124 (65%) , Gaps = 2/124 (1%) 

Query: 2 GGDYVLSILIDKPGGITVEDTAQLTDWSPLLDTIQPDPFPEQYMLEVSSPGLERPLKTA 61 

G D+ L + ID G+ +ED ++++ +S LD + DP + Y LEVSSPG ERPLK 
Sbjct: 33 GKDWFLRVFIDSETGVDLEDCGKVSERLSEKLD--ETDPIEQAYFLEVSSPGAERPLKRE 90 

Query: 62 FJULSNAVGSYINVSLYKSIDKVKIFEGDLLSFDGETLTIDYMDKTRHKTVDIPYQTVAKA 121 

+ L ++G ++V+LY+ ID K EG+L FDGETLTI+ KTR KTV IPY VA A 
Sbjct: 91 KDLLRSIGKIWHVTLYEPIDGEKALEGELTEFDGETLTIEIKIKTRKKTVTIPYAKVASA 150 

Query: 122 RLAV 125 
RLAV 

Sbjct: 151 RLAV 154 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3473> which encodes the amino acid 
sequence <SEQ ID 3474>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 
• Final Results 
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bacterial cytoplasm — Certainty=0 . 3445 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 101/127 (79%) , Positives = 117/127 (91%) 

Query: 1 MGGDYVLSILIDKPGGITVEDTAQLTDWSPLLDTIQPDPFPEQYMLEVSSPGLERPIjKT 60 

MG DY+LSIL+DK GGITVEDT+ LT+++SPLLDTI PDPFP QYMLEVSSPGLERPLKT 
Sbjct: 52 MGSDYILSILVDKEGGITVEDTSDLTNIISPLLDTIDPDPFPNQYMLEVSSPGLERPLKT 111 

Query: 61 AEALSNAVGSYINVSLYKSIDKVKIFEGDLLSFDGETLTIDYMDKTRHKTVDIPYQTVAK 120 

A++L AVGSYIWSLY++IDKVK+F+GDLL+FDGETLTIDY+DKTRHK V+IPYQ VAK 
Sbjct: 112 ADSLKAAVGSYI1TVSLYQAIDKVKVFQGDLLAFDGETLTIDYLDKTRHKIVNIPYQAVAK 171 

Query: 121 ARLAVKL 127 

R+AVKL 
Sbjct: 172 VRMAVKL 178 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1123 

A DNA sequence (GBSxll98) was identified in S.agalactiae <SEQ ID 3475> which encodes the amino 
acid sequence <SEQ ID 3476>. This protein is predicted to be n utilization substance protein a homolog 
(nusA). Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5069 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9565> which encodes amino acid sequence <SEQ ID 9566> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13533 GB:Z99112 nusA [Bacillus subtilis] 
Identities = 164/370 (44%) , Positives = 251/370 (67%) , Gaps = 15/370 (4%) 

Query: 4 MSKEMLEAFRILEEEKHINKEDIIDAVTESLKSAYKRRYGQSESCVIEFNEKKADFTVYT 63 

MS E+L+A ILE+EK I+KE II+A+ +L SAYKR + Q+++ ++ N + V+ 
Sbjct: 1 MSSELLDALTILEKEKGISKEIIIEAIEMUjISAYKKNFNQAQNVRVDENRETGSIRVFA 60 

Query: 64 VREWDEVFDSRLEISLKDALAISSAYELGDKIRFEESVTEFGRVAAQSAKQTIMEKMRR 123 

++WDEV+D RLEIS+++A I Y +GD + E + +FGR+AAQ+AKQ + +++R 
Sbjct: 61 RKDVVDEVYDQRLEISIEFAQGIHPEYMVGDVVEIEVTPKDFGRIAAQTAKQVVTQRVRE 120 

Query: 124 Q^EVTFNEYKQHEGEIMTGTVERFDQRFIYVNLGSLEAQLSHQDQIPGESFKSHDMIDV 183 

R V ++E+ E +IMTG V+R D +FIYV+LG +EA L +Q+P ES+K HD I V 
Sbjct: 121 AERGVIYSEFIDREEDIMTGIVQRLDNKFIYVSLGKIEALLPVNEQMPNESYKPHDRIKV 180 

Query: 184 YOTKVFJJNPKGVWWSRSHPEFIKRINEREIPEVMIGTVEIMSVSREAGDRTKVAVRSH 243 

Y+ KVE KG ++VSR+HP +KR+ E E+PE++DGTVE+ SV+REAGDR+K++VR+ 
Sbjct: 181 YITKVEKTTKGPQIWSRTHPGLLKRLFEIEOTEIYDGTVELKSVAREAGDRSKISVRTD 240 

Query: 244 NSNVDAIGTIVGRGGSNIKKVISNFHPKRVDAKTGLEIPVEENIDVIQWVEDPAEFIYNA 303 

+ +VD +G+ VG G ++ +++ E ID++ W DP EF+ NA 

Sbjct: 241 DPDVDPVGSCVGPKGQRVQAIVNELK GEKIDIVNWSSDPvEFVANA 286 
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Query: 304 IAPAEVDMVLFDDEDTKRATVWPDSKLSIAIG!^GQNWLAAHLTC3YRIDIKSASEYEK 363 

++P++V V+ ++E+ K TV+VPD +I.SLAIG+RGQN RIAA LTG++IDIKS ++ + 
Sbjct: 287 LSPSKVLDVIVNEEE-IOiTTVIVPDYQLSLAIGKRGQNARIiRAKLTGWKIDIKSETDARE 345 

Query: 364 MEAQELQTEE 373 

Sbjct: 346 LGIYPRELEE 355 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3477> which encodes the amino acid 
sequence <SEQ ID 3478>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=0. 2074 (Affirmative) < suco 

bacterial membrane Certair.ty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 348/380 (91%), Positives = 361/380 (94%), Gaps = 2/380 (0%) 

Query: 4 MSKE^EAFRILEEEKHINKEDIIDAOTESLKSAYKRRYGQSESCVIEFNEKKADFTVYT 63 

MSKEMLEAFRILEEEKHI+K DIIDAVTESLKSAYKRRYGQSESCVIEFNEK ADF V+T 
Sbjct: 12 MSKEMLEAFRILEEEKHIDKADIIDAVTESLKSAYKRRYGQSESCVIEFNEKTADFQVFT 71 

Query: 64 TOEVVDEVFDSRLEISLKDALAISSAYELGDKIRFEESVTEFGRVAAQSAKQTIMEKMRR 123 

VREW+EVFDSRLEISLKDALAISSAYELGDKIRFEESV EFGRVAAQSAKQTIMEKMRR 
Sbjct: 72 VREWEEVFDSRLEISLKDALAISSAYELGDKIRFEESVNEFGRVAAQSAKQTIMEKMRR 131 

Query: 124 QMREVTFNEYKQHEGEIMTGTVERFDQRFIYVNLGSLEAQLSHQDQIPGESFKSHDMIDV 183 

QMREV FNEYK+HEGEIMTGTVERFDQRFIYVNLGSLEAQLSHQDQIPGEl FKSHD IDV 
Sbjct: 132 QMREVMFNEYKEHEGEIMTGTVERFDQRFIYVNLGSLEAQLSHQDQIPGETFKSHDRIDV 191 

Query: 184 YWKVENNPKGVWFVSRSHPEFIKRIMEREIPEVFDGTVEIMSVSREAGDRTKVAVRSH 243 

YVYKVENNPKGVNVFVSRSHPEFIKRIME+EIPEVFDGTVEIMSVSREAGDRTKVAVRSH 
Sbjct: 192 YVYKVENNPKGVNVWSRSHPEFIKRIMEQEIPEVFDGTVEIMSVSREAGDRTKVAVRSH 251 

Query: 244 NSNVDAIGTIVGRGGSNIKKVISNFHPKRVDAKTGLEIPVEENIDVIQWVEDPAEFIYNA 303 

N NVDAIGTIVGRGGSNIKKVIS FHPKRVDAKTGLEIPVEENIDVIQWV+DPAEFIYNA 
Sbjct: 252 NPNVDAIGTIVGRGGSNIKKVISKFHPKRTOAKTGLEIPVEENIDVIQWVDDPAEFIYNA 311 

Query: 304 IAPAEVD^tVLFDDEDTKRATVWPDSKr^rAIGRRGQNVRLAAHLTGYRIDIKSASEYEK 363 

IAPAEVDMVLFDDED KFATVWPDSKXiSLAIGRRGQNVRLAAHLTGYRIDIKSASEY++ 
Sbjct: 312 IAPAEVDMVLFDDEDLKRATVWPDS KLSLAI GRRGQNVRLAAHLTGYR I DIKSASEYDR 371 

Query: 364 MEAQELQTEEVAQESEVISD 383 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1124 

A DNA sequence (GBSxll99) was identified in S.agalactiae <SEQ ID 3479> which encodes the amino 
acid sequence <SEQ ID 3480>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2012 (Affirmative) < suco 
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- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13534 GB:Z99112 alternate gene name: ymxB-similar to 
hypothetical proteins [Bacillus subtilis] 
Identities = 46/92 (50%), Positives = 67/92 (72%), Gaps = 1/92 (1%) 

Query: 1 MAKTKKIPLRKSWSGEV1DKRDLLRIVKNKEGQVFIDPTGKQNGRGAYIKLDNDEAILA 60 

M K KKIPLRK W+GE+ K++L+R+V++KEG+4- +DPTGK+NGRGAY4- LD + + A 
Sbjct: 1 MNKHKKIPLRKCVVTGEMKPiaCELIRVVTlSKEGEISVDPTGKKNGRGAYLTLDKECILaA 60 

Query: 61 KKKRVFDRSFSMEVSDEFYDELL&YVDHKVKR 92 

KKK F ++ D+ +DELL + KVK+ 

Sbjct: 61 KKKNTLQNQFQSQIDDQI FDELLELAE - KVKK 91 

A related DNA sequence was identified in S. pyogenes <SEQ ID 348 1> which encodes the amino acid 
sequence <SEQ ID 3482>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1008 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 77/98 (78%) , Positives = 92/98 (93%) 

Query: 1 MAKTKKIPLRKSWSGEVIDKRDLLRIVKNKEGQVFIDPTGKQNGRGAYIKLDNDEAILA. 60 

M+K KKIPLRKS+VSGE+I KRDLLRIVK K+GQVFIDPTGKQNGRGAYI KLDN EA++A 
Sbjct: 2 MSKVKKIPLRKSIiVSGEIIAKRDLLRIVKTKDGQVFIDPTGKQNGRGAYIKLDNQEALiylA 61 

Query: 61 KKKRVFDRSFSMEVSDEFYDELLAYVDHKVKRRELGLE 98 

KKK+VF+RS FSM++ + FYD+I1+AYVDHK+KRRELGL+ 
Sbjct: 62 KKKQVFNRSFSMDIPESFYDDLIAYVDHKIKRRELGLD 99 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1125 

A DNA sequence (GBSxl200) was identified in S.agalactiae <SEQ ID 3483> which encodes the amino 
acid sequence <SEQ ID 3484>. This protein is predicted to be probable ribosomal protein in info 5'region. 
Analysis of this protein sequence reveals the following: 

1 uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06133 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities - 46/95 (48%) , Positives = 65/95 (68%) , Gaps = 1/95 (1%) 

Query: 6 KVUJLIGIAQFAGRLITGEELVIKAIQNQQVSLIFIANDAGPN^ 65 
K L+L+GLA RA +L+TGEE V+KA+QN QV+L+ L++DAG + KK+ DK Y+ V 
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Sbjct: 5 KWLSLLGIAARARQLLTGEEQWKAVQNGQVTLVI LS S DAG I HTKKKLLDKCGS YQI PVK 64 

Query: 66 TVFSTLELSDALGK- PRKWAVADAGFSKKMRTLM 99 

V + L A+GK R V+ V DAGFS+K+ L+ 
Sbjct: 55 WGNRQMLGRAIGKHERWIGVKDAGFSRKLAALI 99 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3485> which encodes the amino acid 
sequence <SEQ ID 3486>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1950 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 75/99 (75%) , Positives = 88/99 (88%) 

Query: 1 MNNSEKVI^LIGIAQRAGRLITGEELVIKAIQNC^VS 60 

+ N E++ +LIG AQRAG+ + 1 +GEELV+KAIQ+QQV L+FLANDAGPN+TKKVTDKSNYY 
Sbjct: 1 LTNLERLSSIjIGPAQRAGKVISGEELVVTCAIQHQQVILVFLANDAGPNVTKKVTDKSNYY 60 

Query: 61 KTEVSTVFSTLELSDALGKPRKWAVAEAGFSKKMRTLM 99 

EVSTV + LELS ALGKPRKV A+ADAGFSKKMRUM 
Sbjct: 61 NVEVSTVLNALELSAALGKPRKVAAIADAGFSKKMRTLM 99 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1126 

A DNA sequence (GBSxl201) was identified in S.agalactiae <SEQ ID 3487> which encodes the amino 
acid sequence <SEQ ID 3488>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0 .2873 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 10959> which encodes amino acid sequence <SEQ ID 
1096O was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3489> which encodes the amino acid 
sequence <SEQ ID 3490>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2985 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 735/961 (76%) , Positives = 805/961 (83%) , Gaps = 42/951 (4%) 
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Sbjct: 1 LSKKRLHEIAKEIGKSSKEWEHAKYLGLDVKSHASSVEEADAKKIISSFSKASKPDVTA 60 

Query: 60 NSVQTNQGVKTESKTVETKQGLSDDKPSTQPVAKPKPQSRNFKAEREARAKAEAEKRQHN 119 

+ +V STV+GS+ TQ V+KPK SRNFKAEREARAK +A ++Q N 

Sbjct: 61 SQTVKPKEVAQPSVTWKETG- SEHVEKTQ - VSKPK- - SRNFKAEREARAKEQAARKQAN 116 

Query: 120 GD HRKNNRHNDTRSDDRR- -HQGQKRSNGNR NDNRQ--G 154 

G +R+ N H D+R H+ Q +N R +DW Q G 

Sbjct: 117 GSSHRSQERRGGYRQPNNHQTNEQGDKRITHRSQGDTNDKRIERKASNVSPRHDNHQLVG 176 

Query: 155 QQNN RNKNDGRYADHKQKPQTRPQQPAGNRIDFKARAAALKAEQNAEYSRHSEQRF 210 

+N N +GR+ + K++ + PQ + +IDFKARA&ALKAEQNAEYSR SE RF 

Sbjct: 177 DRNRSFAKENHKNGRFTNQKKQGRQEPQSKSP-KIDFKARAAALKAEQNAEYSRQSETRF 235 

Query: 211 REEQEAKRQAAKEQELAKAAALKAQEEAQKAICEKLASKPVAKVKEIVNKVAATPSQTADS 270 

R +QEAKR A ++ AK AALKAQ E +EAK+ + ++ + TAD+ 

Sbjct: 236 RAQQEAKRLAELARQEAKEAALKAQAEEMSHREA-ALKSIEEAETKLKSSNISAKSTADN 294 

Query: 271 RRKKQTRSDKSRQFSNENEDGQKQTRNKKNWNNQSQVRNQRJISNWNHNKKNKKGK T 326 

RRKKQ R +K+R+ ++ +++GQK +NKK+WN+QNQVRNQ+NSNWN NKK KKGK T 
Sbjct: 295 RRKKQARPEKNRELTHHSQEGQK--KNKKSTOSQNQVRNQKNSMNKNKKTKKGKNVKNT 352 

Query: 327 NGAPKPVTERKFHELPKEFEYTEGMTVAEIAKRIKREPAEIVKKLFMMGVMATQNQSLDG 386 

N APKPVTERKFHELPKEFEYTEGMTVAEIAKRI KREPAEIVKKLFMMGVMATQNQSLDG 
Sbjct: 353 NTAPKPVTERKFHELPKEFEYTEGMTVAEIAKRIKREPAEIVKKLFMMGVMATQNQSLDG 412 

Query: 387 DTIELLMVDYGIEAHAKVEVDEADIERFFADEDYLNPDNITERPPVVTIMGHVDHGKTTL 446 

DTIELLMVDYGIEA AKVEVD+ADIERFF DE+YLNP+N+ ER PWTIMGHVDHGKTTL 
Sbjct: 413 DTIELLMVDYGIEAKAKVEVDDADIERFFEDENYU^ENIVERAPVVTIMGHVDHGKTTL 472 

Query: 447 LDTLRNSRVATGEAGGITQHIGAYQIEEAGKKITFLDTPGHAAFTSMRARGASVTDITIL 506 

LDTLRNSRVATGEAGGITQHIGAYQIEEAGKKITFLDTPGHAAFTSMRARGASVTDITIIj 
Sbjct: 473 LDTLRNSRVATGEAGGITQHIGAYQIEEAGKKITFLDTPGHAAFTSMRARGASVTDITIL 532 

Query: 507 IVAADDGVMPQTVEAINHSKAAGVPIIVAINKIDKPGANPERVISELAEHGVISTAWGGE 566 

IVAADDGVMPQT+EAINHSKAAGVPIIVAINKIDKPGANPERVI+ELAE+G+ISTAWGGE 
Sbjct: 533 I VAADDGVMPQTI EAINHS KAAGVP 1 1 VAINKIDKPGANPERVI AELAEYGI I STAWGGE 592 

Query: 567 SEFVEISAKFGKNIQELLETVLLVAEMEELKADADVRAIGTVIEARLDKGKGAVATLLVQ 626 

EFVEISAKF KNI ELLETVLLVAE+EELKAD VRAIGTVIEARLDKGKGA+ATLLVQ 
Sbjct: 593 CEFVEISAKFNKNIDELLETVLLVAEVEELKADPTVRAIGTVIEARLDKGKGAIATLLVQ 652 

Query: 627 QGTLWQDPIWGNTFGRVRAMTNDLGRRVKVAGPSTPVSITGLNEAPMAGDHFAVYADE 686 

QGTL+VQDPIWGNTFGRVRAM NDLGRRVK A PSTPVSITGLNE PMAGDHFAVYADE 
Sbjct: 653 QGTLHVQDPIWGNTFGRVRAMVNDLGRRVKSA3PSTPVSITGLNETPMAGDHFAVYADE 712 

Query: 687 I<AAFAAGEERAKFMLKQRQOTQRVSLENLFDTLKAGEVKSVWIIKADVQGSVFAIAAS 746 

KAARAAGEER+KRALLKQRQNTQRVSIj+NLFDTLKAGE+K+WI IKADVQGSVEALAAS 
Sbjct: 713 KAAPJ^GEERSFEALLKQRQNTQRVSLDNLFDTLKAGEIKTVIJVI IKADVQGSVEALAAS 772 

Query: 747 LLKIDVEGVKVNVVHSAVGAINESDVTIJVEASNAVIIGFMVRPTPQARQQADADDVEIRQ 806 

L+KI+VEGV+VNVVHSAVGAirffiSDTCIiAiaSNAVIIGFNWPTPQARQQAD DDVEIR 
Sbjct: 773 LVKIEVEGTOV1WVHSAVGAINESDVTIAEASNAVIIC-FN\'RPTPQARQQADTDDVEIRL 832 

Query. 807 HSIIYKVIEEWEAMKGKLDPBTYQEKILGEAIIRETFKVSKVGTIGGFMVINGKVTRDSS 866 

HS I IYKVIEEVEEAMKGKLDP YQEKILGEAI IRETFKVSKVGTIGGFMVINGKVTRDSS 
Sbjct: 833 HS 1 1 YKVIEEVEEAMKGKLDPVYQEKILGEAI IRETFKVSICVGTIGGFMV1NGKVTRDSS 892 

Query: 867 WVIRDGWIFDGKIASLKHYKTDXTKEVGNAQEGGMIEriYNDLKEDDTIEAYIMEEIKRK 927 

VRVIRD WI FDGKLASLKHYKDDVKE VGNAQEGGLM I EN+NDLK DDTIEAYIMEEI RK 
Sbjct: 893 VRVTRDS WI FDGKLASLKHYKDDVKE VGNAQEGGLM I ENFNDLKVDDT I EAYIMEEI VRK 953 



65 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1127 

A DNA sequence (GBSxl202) was identified in S.agalactiae <SEQ ID 3491> which encodes the amino 
acid sequence <SEQ ID 3492>. This protein is predicted to be ribosome binding factor A (rbfA). Analysis 
of this protein sequence reveals the following: 

5 Possible site: 56 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2557 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9567> which encodes amino acid sequence <SEQ ID 9568> 
was also identified. 

15 A related DNA sequence was identified in S.pyogenes <SEQ ID 3493> which encodes the amino acid 
sequence <SEQ ID 3494>. Analysis of this protein sequence reveals the following: 
Possible site: 60 

»> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 .4765 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 93/117 (79%) , Positives = 103/117 (87%) 

Query: 8 LIMANHRIDRVGMEIKREV^ILRLRVM)PRVQDVTITDVQMLGDLSMAKVFyriHSTLA 67 
+ MANHRIDRVGMEIKREVN+IL+ +V DPRVQ VTIT+VQM GDLS+AKV+YTI S LA 
30 Sbjct: 1 MAMANHRIDRVGMEIKREvlTOILQKKVRDPRVQGOTITEVQ 60 

Query: 68 SDNQKAQIGLEKATGTIKIIELGKNLTMYKIPDLQFVKDESIEYGNKIDEMLRNIjDKK 124 

SDNQKAQ GLEKATGTIKRELGK LTMYKIPDL F KD SI YGNKID++LR+LD K 
Sbjct: 61 SDNQKAQTGLEKATGTIKRELGKQLTMYKIPDLVFEKDNSIAYGNKIDQLLRDLDNK 117 

35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1128 

A DNA sequence (GBSxl203) was identified in S.agalactiae <SEQ ID 3495> which encodes the amino 
40 acid sequence <SEQ ID 3496>. This protein is predicted to be esterase. Analysis of this protein sequence 
reveals the following: 

Possible site: 28 

>>> Seems to have a cleavable N-term signal seq. 

45 Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA79277 GB:M64783 acetyl-hydrolase [Streptomyces hygroscopicus] 
Identities = 58/220 (26%) , Positives = 90/220 (40%) , Gaps = 8/220 (3%) 
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Query: 98 WNDNGKAMQKTIFYLAGGSYLmiPTPyHISMLKTLSTSLDAKIILPIYPKTPRYTYDYAI 157 

W + + +T+ YL GGSY H + L + A++ Y+P +A+ 

Sbjct: 58 WVRPARQDGRTLLYLHGGSYALGSPQSHRHLSSALGDAAGAAVLALHYRRPPESPFPAAV 117 

Query: 158 PRLVNLYRHFHEKN ANLTLMGDSAGGGLALGLAHALSHQSGQEAIPQPKNIILLSPW 214 

V YR E+ +TL GDSAG GLA+ AL E P + +SPW 

Sbjct: 118 EDAVAAYRMLLEQGCPPGRVTLAGDSAGAGLAVAALQALR DAGTPLPAAAVCISPW 173 

Query: 215 LDVTMKHPEIPKYEDTDPILSAWGIARVGEIWMGSlMTmffiTWSPKNAPATKLAPITLF 274 

D+ + 4 + +L L R+ E ■)- G+ + H SP + T L P+ + 

Sbjct: 174 ADLACEGASHTTRKAREILLDTADLRRMAERYLAGT-DPRHPLASPAHGDLTGLPPLLIQ 232 

Query: 275 TGTREIFFPDIRDYAAQLQAANHPVNYIAQEGMNHVYPIY 314 

G+ E+ D R A PV + M HV+ Y 

Sbjct: 233 VGSEEVLHDDARALEQAALKAGTPVTFEEWPEMFHVWHWY 272 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3497> which encodes the amino acid 
sequence <SEQ ID 3498>. Analysis of this protein sequence reveals the following: 

a. cleavable N-term signal seq. 

- Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 244/334 (73%), Positives = 280/334 (83%), Gaps = 6/334 (1%) 

30 Query: 1 MKPSFKKLLLLFSIITILSIACTPHAKASGRSWKSKFIEQYFWLKRDKSYYKVQDESSFQ SO 

+K +K L+ ++ L + TP A AS RSWKSWFIEQYFWLKRDKSYY QD+ SFQ 
Sbjct: 1 LKHPIRKTLVTLGLIaLTLCLP-TPVA-ASSRSWt^SKFIEQYFtVLKRDKSYYSKQDDPSFQ 58 



Query: 61 KYLNASREQSDKGYYLDPNSVNGGLVQERLFDMQVYSWNDNGKANQKTIFYLAGGSYLNN 120 

+YL+A REQSDK Y LD N VNG LVQE h+ MQVYSWNDNGK +QKTI YLAGGSYLNN 
Sbjct: 59 RYLDACREQSDKPYQLDTNLVNGPLVQENLYGMQVYSWNDNGKPDQKTIIYLAGGSYLNN 118 

Sbjct: 

Query: 181 AGGGIALGLAHALSHQSGQFAIPQPi^IILLSPra,DVTMKHPEIPKYEDTDPILSAWGLA 240 

AGGGLALGLAHAL + E++PQPK +H-LLSPWLDVTM HPEIP+YED DPILS+WGL 
Sbjct: 179 AGGGLALGLAHALHN ESVPQPKQLVLLSPWLDVTMSHPEIPEYEDADPILSSWGLK 234 



Query: 301 YIAQEGMNHVYPIYPIEEAKTAQYQMIDIINKTP 334 

+1 QEGMNHVYPIYPIEEAKTAQYQ4ID INKTP 
Sbjct: 295 FITQEGMNHVYP1 YPIEEAKTAQYQI IDAINKTP 328 



A related GBS gene <SEQ ID 873 1> and protein <SEQ ID 8732> were also identified. Analysis of this 
55 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 11.88 
GvH: Signal Score (-7.5): -1.33 
Possible site: 28 
60 »> Seems to have a cleavable N-term signal seq. 

ALOM program count: 0 value: 4.03 threshold: 0.0 
PERIPHERAL Likelihood = 4.03 174 
modified ALOM score: -1.31 
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*** Reasoning Step: 3 

Final Results 

5 bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

10 28.4/46.2% over 220aa 

Streptomyces 

hygroscopicus 

EGAD|S925| acetyl -hydrolase Insert characterized 

15 ORF00486(589 - 1245 of 1602) 

EGAD|5925|5724(57 - 277 of 300) acetyl -hydrolase {Streptomyces hygroscopicus} 
%Match =6.8 

%Identity =28.3 %Similarity =46.1 

Matches = 62 Mismatches = 111 Conservative Sub.s = 39 

20 

462 492 522 552 582 612 642 669 

KRDKSYYKVQDESSFQKYLNASREQSDKGr^DPNSTOG 

| : :: | ,|,:|| INI I =| 

ELELWELIELKVfflTRNGEMEPRRIAYDRAQEAFGI^GVPPGDWWGHCTAEWVRPARQDGRTLLYLHGGSYALGSPQS 
25 20 30 40 50 60 70 80 

696 726 756 786 837 867 897 

Y-HISMLKTLSTSLDAKIILPIYPKTPRYTYDYAIPRLVNLYRHFHEKN ANLTLMGDSAGGGLALGLAHALSHQSGQ 

: 1=1 | s | :s | , | : |: | || : |: :|| ||||| |||: :|| 

30 HRHLS- -SALGDAAGAAVIiALHYRRPPESPFPAAVEDAVAAYRMLLEQGCPPGRVTIAGDSAGAGLAVAALQAL RD 

100 110 120 130 140 150 

927 957 987 1017 1047 1077 1107 1137 

EAIPQPKNIILLSPWLDWMKHPEIPKYEDTDPILSAKGIARVGEIWANGSNMINHTWSPKNAPATKLAPITLFTGTRE 
35 ||= =111 1= = = = =1 | |= | : |:= I || : 111== 1=1 

AGTPLPAftAVCI S PWADLACEGASHTTRKARE ILLDTADLRRMAERYLAGTD - PRHPLAS PAHGDLTGLPPLLI QVGSEE 
170 180 190 200 210 220 230 

1167 1197 1227 1245 1275 1305 1335 1365 

40 I FFPDIRDYAAQLQAANHP VNYIAQEGMNHV YPIYPIEEAKTAQYQMIDIINKTP*Y*LSQL*SYKK*TMILTWFI 

= = I I 111= I II =1= I = =1 

VLHDDARALEQAALKAGTPVTFEEWPEMFHVWHWYHPVLPEGRRAA.IEVAGAFLRTATGEGLK 
250 260 270 280 290 300 

45 SEQ ID 8732 (GBS149) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 23 (lane 6; MW 37kDa). 

The GBS149-His fusion product was purified (Figure 196, lane 6) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 291), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1129 

A DNA sequence (GBSxl204) was identified in S.agalactiae <SEQ ID 3499> which encodes the amino 
acid sequence <SEQ ID 3500>. This protein is predicted to be CopY. Analysis of this protein sequence 
55 reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .3140 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Sbjct: 

Query: 62 FSYSPLIDEDLAMMSEVDSVFQKVCQTKHVAIVRHLLESIPMTEKDRLNLQSSLEAKKGK 121 

+ YS LI E+ A+ +V VF ++C TKH A++RHL+E PMT D L++ L +KK 
Sbjct: S3 YIYSSLISEEEALEQQVSEVFSRICVTKHQALIRHLVEETPMTLSDIEKLEALLLSKKAN 122 

Query: 122 TLERVACNCI PGQCQCH 138 

4 V CNCI GQC C+ 
Sbjct: 123 AVPEVKCNCIVGQCSCY 139 

A related DNA sequence was identified in S.pyogenes <SEQ ID 350 1> which encodes the amino acid 
sequence <SEQ ID 3502>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2331 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 54/135 (40%) , Positives = 84/135 (62%) 
Query: 
Sbjct: 



Sbj, 
Sbjct: 



ISSAEWEIMRVVWAQQIWTSNEIMVLLEKYDWTPSTVKTLLRRLLDKGYVSREKMGKGF 62 
IS+AEWE+MRWWA + S++I+ +L +KY W+ ST+KTL+ RL+ K +++ + G+ + 
ISAAEWEVMRVVWASGDIKSSDIITILRKKYQWSDSTIKTLIGRLVKKNFLTSYRQGRAY 69 



63 SYSPLIDEDLAMMSEVDSVFQKVCQTKHVAIVRHLLESIPMTEKDRLNLQSSLEAKKGKT 122 

Y L+DE L + +V +CQ +H ++ h +PMT ++ Q LE KK 

70 IYQALLDETLLQKEALATVLDGICQRQHTRLLLERLYHLPMTLEEIGAFQELLEVKKENA 129 

123 LERVACNCI PGQCQC 137 

+ V CNC+PGQC C 
130 VLEVPCNCLPGQCHC 144 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1130 

A DNA sequence (GBSxl206) was identified in S.agalactiae <SEQ ID 3503> which encodes the amino 
acid sequence <SEQ ID 3504>. This protein is predicted to be CopA. Analysis of this protein sequence 
reveals the following: 

Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.82 Transmembrane 382 - 398 ( 370 - 406) 
INTEGRAL Likelihood = -8.01 Transmembrane 356 - 372 ( 344 - 374) 
INTEGRAL Likelihood = -2.50 Transmembrane 719 - 735 ( 719 - 738) 
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Likelihood = -2.28 
ITEGRAL Likelihood = -1.59 
ITEGRAL Likelihood = -1.33 

- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



i2 - 218 ( 202 - 218) 
Transmembrane S93 - 709 ( 691 - 712) 
Transmembrane 167 - 183 ( 167 - 183) 



•- Certainty=0.4.' 
■- Certainty=0 . 0( 
■- Certainty=0.0( 



7 (Affirmative! 
0(Not Clear) . 
0(Not Clear) • 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG10086 GB:AP296446 CopA [Streptococcus mutans] 
Identities = 440/740 (59%) , Positives = 571/740 (76%) , Gaps = 1/740 (0%) 



KETFLIDGMTCASCALTIEKAWKLDHVDSAVVNLATEKMTVTFDDTTLSPWVIEECVSE 64 
+E FLIDGMTCASCA+ +E AV KLD ++SAWNL TEKMT+ +D +S + + V+ 
EEVFLIDGOTCASCAIimNAVKKLDC-IESAVVNLTTEKMTIDYDAAKVSEADvTKAVAG 62 



Query: 5 

Sbjct: 3 

Query: 65 

Sbjct: 63 

Query: 125 DKGPLNYGMIQLLLTLPVMYFGRIFYQNGFKALFKRHPNMDSLVAIATTAAFIYSLYGLY 184 

PL Y M+ LLLT+PV+ FY NGF++LFK HPNMDSLV++ATTAAF+YSLYG Y 

Sbjct: 123 SSAPLTYAMVLLLLTIPVIVLSWSFYDNGFRSLFKGHPNMDSLVSIATTAAFLYSLYGTY 182 

Query: 185 EILQGDIHYAHQLYFESVAVILTLITLGKYFEILSKGRTSASIEKLLTLSAKEARVIKDG 244 

+ G H+AH LY+ESVAVILTLITLGKYFE LSKGRTS +I+KL+ LSAKEA +I+DG 
Sbjct: 183 H\^YLGHTHHAHHLYYESVAVILTLITLGKYFETLSKGRTSDAIKKLMHLSAKEATLIRDG 242 

Query: 245 EDYMVPLDKVKIGETILVKP6EOPLDGHVVAGESSIDBSMLTGESIPVEKKVGSKVYGA 304 

E+ VP+++V+I + ILVKPGEKIP+DG V++G S+IDESMLTGESIP+EK S VY 
Sbjct: 243 EEIKVPIEQVQIRDQILVKPGEKIPVDGRVLSGHSAIDESMLTGESIPIEKMADSPVYAG 302 

Query: 305 SINGQGSLTIFVEKEAGGSLLSQIINLVEAAQTSKAPIANLADKVSGVFVPFv'IVIAILS 364 

SINGQGSLT EK +LLSQII LVE AQ +KAPIA +ADKVS VFVP +1 IAIL+ 
Sbjct: 303 S INGQGSLTFEAEKVGNETLLSQI I KL VENAQQTKAP IAKI ADKVSAVFVP VI ITIAILT 362 

Query: 365 GLSWYLII/3QSFAFSLKIMIAVLVIACPCALGIiATPTAIMVASGKAAENGILFKGGEVLE 424 

GL WY ++GQ F FS+ I +AVLVIACPCALGLATPTAIMV +G+AAENGIL+K G+VLE 
Sbjct: 363 GLFWYFVMGQDFTFSMTISVAVLVIACPCALGLATPTAIMVGTGRAAENGILYKRGDVLE 422 

Query: 425 KAHHIDTIVFDKTGTLTKGKPEWAIKTYGGDKEEFLGQVASVEKLSNHPLSQTIVNPCAK 484 

AH I +TI VFDKTGT+T+GKPEW +Y D+ + + A++E LS HPLSQ IV+ AK 
Sbjct: 423 LAHQINTIVFDKTGTITQGKPEWHQFSY-HDRTDLVQVTAALEALSEHPLSQAIVDYAK 481 

Query: 485 EKELPLREVMAFICNILGYGLSATIKGKTMLVGNANLMTKlIDVNLDIjAKADIEIAQEEAQT 544 

++ L V F ++ G GL + +T+LVGN LM + +++L+ A+AD + A + QT 
Sbjct: 482 KEGTHLLAVDDFTSLTGLGLKGCVADETLLVGNEKLMRQANISLEQAQADFKAATAQGQT 541 

Query: 545 WYVSENGVLSGLITLTDQLKTDSQET\ r KQLQRLGFNLVLLTGDNKASADAIAQKLGITT 604 

++V+ +G L GLIT+ D++K DS TVK LQ +G + +LTGDN+ 4A AIA+++GIT 
Sbjct: 542 PIFVASDGQLLGLITIADKVICNDSAATVKALQNMGVEVAMLTGDNEETAQAIAKEVGITF 601 



Sbjct 
Sbjc 



605 WSEVLPDQKAOTILELKEKGGQIAM\ r GDGINDAPALASSDVGISMSSGTDIAIESADIV 664 

V+S+V +K IL+L+ +G ++AMVGDGINDAPALA++D+GISM SGTDIA+ESADIV 
602 VISQVFSQEKTQAILDLQAEGKKVAMVGDGINDAPALATADIGISMGSGTDIAMESADIV 661 

665 LMKPELTDLLKAMTISKQTIQIIKENLFWAFFYNVLAIPVAMGVLHLFGGPLLHPMLAGL 724 

LMKP + D++KA+ IS+ TI IKENLFWAF YNVL++P+AMGVL+LFGGPLL+PM-1-AGL 
662 LMKPAMLDIIKALKISRVTIINIKENLFWAFIYNVLSVPIAMGVLYLFGGPLLDPMIAGL 721 

725 AMAFS S VSWLNALRLKVLK 744 

AM+FSSVSWLNALRLKV+K 
722 AMSFSSVSWUSIALRLKVVK 741 



There is also homology to SEQ ID 3506. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1131 

A DNA sequence (GBSxl207) was identified in S.agalactiae <SEQ ID 3507> which encodes the amino 
5 acid sequence <SEQ ID 3508>. This protein is predicted to be cation-transporting ATPase, P-type (pacS). 
Analysis of this protein sequence reveals the following: 

Possible site: 28 

>» Seems to have no N- terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 1934 (Affirmative) < suco 

bacterial membrane Certainty= 0 . 0 0 0 0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG10087 GB:AF296446 CopZ [Streptococcus mutans] 
Identities = 31/67 (46%) , Positives = 43/67 (63%) 

Query: 1 MKHTYRVSGMKCDGCAKWSDKLSSVIGVDEVNvDLTKNQVWSGKTFKWLLKRSLKDTK 60 
20 M+ TY + G+KC GCA V+ + S + V++V VDL K +V ++G KW LKR+LK T 

Sbjct: 1 MEKTYHIDGLKCQGCADNOTKRFSELKKVNDvKTO 60 

Query: 61 YSLEEEI 67 
Y L EI 

25 Sbjct: 61 YELGAEI 67 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3509> which encodes the amino acid 

sequence <SEQ ID 3510>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
30 >» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2997 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside --- Cer.tainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 33/63 (52%) , Positives = 48/63 (75%) 

40 Query: 1 MKHTYRVSGMKCDGCAKWSDKLSSVIGVDEVNTOLTKKQVWSGKTFKWLLKRSLKDTK 60 

M+ Y+V+GM CDGCA+TV+4KLS+V GV V V+L K + V+G+ +L+KR+LKDTK 
Sbjct: 1 MEKHYQvTGOTCDGCARTVTEKLSAVPGVQSVQVNLEKGEAKVTGRPLTFLIICRAIjKDTK 60 

Query: 61 YSL 63 
45 + L 

Sbjct: 61 FEL 63 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 1132 

A DNA sequence (GBSxl208) was identified in S.agalactiae <SEQ ID 351 1> which encodes the amino 
acid sequence <SEQ ID 3512>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
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»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7. 59 Transmembrane 67 - 83 ( 65 - 90) 
INTEGRAL Likelihood = -3.72 Transmembrane 35 - 51 ( 31 - 51) 
INTEGRAL Likelihood = -3.61 Transmembrane 122 - 138 ( 120 - 139) 
INTEGRAL Likelihood = -1.59 Transmembrane 154 - 170 ( 154 - 171) 

Final Results 

bacterial membrane Certainty=0. 4036 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8733> which encodes amino acid sequence <SEQ ID 8734> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 4.09 
GvH: Signal Score (-7.5): 3.87 

Possible site: 20 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 4 value: -7.59 threshold: 0.0 

INTEGRAL Likelihood = -7.59 Transmembrane 65 - 81 ( 63 - 88) 

INTEGRAL Likelihood = -3 

INTEGRAL Likelihood = -3 

integral Likelihood = -1 



72 Transmembrane 33 - 49 ( 29 - 49) 

61 Transmembrane 120 - 136 ( 118 - 137) 

59 Transmembrane 152 - 168 ( 152 - 169) 
85 96 



Step: 3 

Final Results 

bacterial membrane Certainty=0. 4036 (Affirmative) < succ 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 8 WNILSLVGTVAFASSGAIVAIEEEFDILGLFILGFVTAFGGGAIRNVLIGLPIETLWSQG 67 

W +LS++G +AFA SGAIVA+EEE+DILG++ILG VTAFGGGAIRN+LIG+P+ LW QG 
Sbjct: 3 VfELLSVIGIIAFAVSGAIVAMEEEYDILGVYILGIVTAFGGGAIRNLLIGVPVSALWEQG 62 

Query: 68 IAFYAAAAAILFIMIFPNLLSGKGRDAEWSDAIGLAAFSVQGALYATQSHQPLSAVIVA 127 

F A +1 + +FP LL +SDAIGLAAF++QGALYA + PLSAVIVA 

Sbjct: 63 AYFQIALLSITIVFLFPKLLLKHVWKWGNLSDAIGIjAAFAIQGALyAVKMGHPLSAVIVA 122 

Query: 128 AVLTGAGGGIVRDVLAGRKPGVLRSEIYAGWSILVGIILYFKIAKTTTDYYLLVLWTSL 187 

AVLTG+GGGI +RD+LAGRKP VL++EIYA W+ L G+I+ + Y+L V+ 

Sbjct: 123 AVLTGSGGGIIRDLLAGRKPLVLKAEIYAWAALGGLIVGLGWLGNSFGLYVLFFVLWC 182 

Query: 188 RMLGYKKQWHLP 199 

R+ Y W LP 
Sbjct: 183 RVCSYMFNWKLP 194 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3513> which encodes the amino acid 
sequence <SEQ ID 3514>. Analysis of this protein sequence reveals the following: 

Possible site: 27 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.15 Transmembrane 70 - 86 ( 65 - 88) 

INTEGRAL Likelihood = -4.09 Transmembrane 33 - 49 ( 29 - 49) 

INTEGRAL Likelihood = -2.13 Transmembrane 120 - 136 ( 119 - 137) 

INTEGRAL Likelihood = -0.43 Transmembrane 173 - 189 ( 172 - 189) 



Final Results 
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bacterial membrane — Certainty=0. 3 060 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



5 The protein has homology with the following sequences in the databases: 

>GP:BAB05428 GB:AP001512 unknown conserved protein [Bacillus halodurans] 
Identities = 109/195 (55%) , Positives = 137/195 (69%) 

Query: 6 WEILNIIGTIAFALSGAIVAMEEEFDILGIFILGFOTAFGGGAIRNTLIGLPIEALWGQK 65 
10 W++LN+IGTIAFALSG IVAMEE+FD++G++ILGFVTAFGGGAIRN LIG+P+ ALW Q 

Sbjct: 3 TOVLNVIGTIAFALSGVIVAMEEDFDLMGVYILGFVTAFGGGAIRNLLIGVPVSALWEQG 62 

Query: 66 PEFTCAFFAMVLIMLFPKLMARGWVRAAVLTDAIGLAAFSVQGALHAWLNQPLSAVIVT 125 
FT AF M + PL W++ +L DAIGLAAF~+QGAL A ++ PLSAVIV 
15 Sbjct: 63 TLFTIAFIVMTIAFFLPNLWINHWLKFGLLFDAIGLAAFAIQGALFATSMDHPLSAVIVA 122 



Query: 126 AVLTGAGGGVVRDIIAGRKPSVLRSEIYAGWSILAAIVLHFKLADSTIECYALWLLTTL 185 

A LTGAGGG+VRD+LA RKP VL EIY GW++LA + + I r L++L+ L 
Sbjct: 123 AALTGAGGGIVRDMLARRKPLVLSKEIYIGWAMLAGAAIGLNIVSGPIGIGFLIILWFL 182 

Query: 186 RMIGNRKKWNLPKIK 200 

RM+ W LP K 

Sbjct: 183 RMLSVHYNWCLPHRK 197 



25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 133/200 (66%) , Positives = 168/200 (83%) 



Query: 


3 


MSIDIWNILSLVGTVAFASSGAIVAIEEEFDILGLFILGFVTAFGGGAIRMVLIGLPIET 


62 






M+ID+W IL+++GT+AFA SGAIVA+EEEFDILG+FILGFVTAFGGGAIRN LIGLPIE 




Sb j ct : 


1 


MTIDMWEILNIIGTIAFALSGAIVAMEEEFDILGIFILGFvTAFGGGAIRNTLIGLPIEA 


60 


Query: 


63 


LWSQGIAFYAAAAAILFIMIFPNLLSGKGRDAEWSDAIGLAAFSVQGALYATQSHQPLS 


122 






LW Q FA A++ IM+FP L++ A V++DAIGLAAFSVQGAL+A + +QPLS 




Sbj ct : 


61 


LWGQKPEFTCAFFAMVLIMLFPKLMARGWVRAAVLTDAIGIjAAFSVCjGALHAVRLNQPLS 


120 




123 


AVIVAAVLTGAGGGIVRDVLAGRKPGVLRSEIYAGWSILVGIILYFKIAKTTTDYYLLVL 


182 






AVIV AVLTGAGGG+VRD+LAGRKP VLRSEIYAGWSIL I+L+FK+A +T + Y LV+ 




Sb j ct : 


121 


AVIVTAVLTGAGGGVTODIIAGRKPSVLRSEIYAGWSILAAIVLHFKLADSTIECYALW 


180 




183 


WTSLRMLGYKKQWHLPWR 202 








++T+LRM+G +K+W+LP ++ 




Sbj ct : 




LLTTLRMIGNRKKWHLPKIK 200 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 1133 

A DNA sequence (GBSxl209) was identified in S.agalactiae <SEQ ID 3515> which encodes the amino 
acid sequence <SEQ ID 3516>. Analysis of this protein sequence reveals the following: 

Possible site: 42 
50 »> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2805 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9569> which encodes amino acid sequence <SEQ ID 9570> 
was also identified. 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB94816 GB:AJ245582 hypothetical protein [Streptococcus thermophilics] 
Identities = 138/238 (57%) , Positives = 184/238 (76%) 

Query: 5 KKMIKLIAIDMDGTLLNDEKKIPKENIQAIKEATQAGIKIVLCTGRPMSGILPYFNELGIi 64 

+ +KLIAIDMDGTLLN +K+IPKENI+AI+EAT AGIKIVLCTGRP SGI+P+F +LGL 
Sbjct: 3 QNQVKLIAIDMDGTLIMSQKEIPKENIKAIQEATAAG1KIVLCTGRPRSGIVPHFEKLGL 62 

Query: 65 TKEEYIIMNNGCSTYSTECDWQLIDSATLTHDELIFLEEWKEFPNVCLTLTAENTFYAVG 124 

++EE+IIMNNGCSTY TK+W L++S +L+ E+ L + ++FP V LT T E ++Y VG 
Sbjct: 63 SEEEFIIMNNGCSTYETKNWTLLESESLSRSEMEELLQACEDFPGVALTFTGEKSYYWG 122 

Query: 125 EEVPEIVAYDADLVFTKAKSTSLDALRNQEEIVFQAKYMGLDADVTAFQEAVEEALISKF 184 

EVPE+VAYDA VFT+AK+ SL+ + + +++FQAMYM + AFQ AV++ L + 

Sbjct: 123 NEVPELVAYDAGTVFTEAKARSLEEIFEEGQVIFQAKYMAESEPLDAFQNAVQDRLDQSY 182 

Query: 185 SGTOSQDYIYEIMPQGVTKARGLKSLIAKLGLDINQVMAIGDAPNDIELLDLVPNSVA 242 

S VRSQ+YI+E+MPQG TKA GLK L KL ++ +Q+MA+GDA ND+E+L V SVA 
Sbjct: 183 STWSQEYIFEVMPQGATKASGLKHLAEKLDINRDQIMALGDAANDLEMLQFVGQSVA 240 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3517> which encodes the amino acid 
sequence <SEQ ID 3518>. Analysis of this protein sequence reveals the following: 
Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1468 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 152/270 (56%) , Positives = 193/270 (71%) 

Query: 6 KMIKLIAIDMDGTLLNDEKKIPKENIQAIICEATQAGIKIVLCTGRPMSGILPYFNELGLT 65 

+MI+LIAID+DGTLLN +K+IPKENI AI+EA Q+G+KIVLCTGRP SG PYF++LGLT 
Sbjct: 19 RMIQLIAIDLDGTLLNQDKQIPKENITAIQEAAQSGLKIVLCTGRPQSGTRPYFDQLGLT 78 

Query: 66 KEEYIIMNNGCSTYSTKDWQLIDSATLTHDELIFLEEWKEFPNVCLTLTAENTFYAVGE 125 

+EE++I+NNGCSTYS+ DWQL S h ++ LEE+ + FP++ LTLT EN + + E 
Sbjct: 79 QEEFLIINNGCSTYSSPDWQLRHSKMLKVSDIELLEELSQSFPDIYLTLTEENDYLVLEE 138 

Query: 126 EVPEI VAYDADLVFTKAKSTSLDALRNQES I VFQAKYMGLDADVTAFQEA VEEALI SKFS 185 

EVP++V D DLVFT K SL L + ++FQAMY+G A + AF+ AV L F 
Sbjct: 139 EVPDLVQEDGDLVFTIVKPVSLAELSDTPRLIFQAMYLGEKAALDAFERAVRNQLSQSFH 198 

Query: 186 GVRSQDYIYEIMPQGVTKARGLKSLIAKLGLDINQWAIGDAPNDIELLDLVPNSVAMGN 245 

VRSQD I EI+PQGV+KA LK h+ LGL +QVMAIGDAPNDIE+L VAM N 

Sbjct: 199 VWSQDNILEILPQGVSKASALKELVEDLGLTADQVMAIGDAPNDIEMLTYAGLGVAMEN 258 

Query: 246 ASDEIKSRCKYITVDNNKAGVAKAIYDYAL 275 

AS IK +T+ N+ AGVA+AI +AL 

Sbjct: 259 ASAAI KPLADKVTLTNDMAGVAQAI RQFAL 288 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1134 

A DNA sequence (GBSxl210) was identified in S.agalactiae <SEQ ID 3519> which encodes the amino 
acid sequence <SEQ ID 3520>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
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have no N- terminal signal sequence 

Likelihood = -0.43 Transmembrane 7 - 23 ( 7 - 23) 

_ Final Results 

bacterial membrane Certainty=0 . 1171 (Affirmative) < succ: 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 3 NKNKLLLIDGSSVAFRAFFAL^QIDRFKNNSGLH'TNAIYGFHLMLNHILGRVQPSHILV 62 

+K KLLLIDGSSVAFRAFFALY Q+DRFKN +GLHTNAIYGF LML+H+L RV+PSHILV 
Sbjct: 2 DKKKLLLIDGSSVAFRAFFALYQQLDRFKNAAGLHTNAIYGFQLMIiSHLLERVEPSHILV 61 

Query: 63 AFDAGKTTFRTEMYADYKGGRAKTPDEFREQFPYIRQQLDVLGIKHYKIiEHYEADDIIGT 122 

AFDAGKTTFRTEMYADYKGGRAKTPDEFREQFP+IR+ LD +GI+HYEL YEADDIIGT 
Sbjct: 62 AFDAGKTTFRTEMYADYKGGRAKTPDEFREQFPFIRELLDHMGIRHYELAQYEADDIIGT 121 

Query: 123 llAKC^^SNEHFDITWSGDKDLIQLTDTimATEISKKGVAEFEEFTPAYLMEKMGITPS 182 

L K AE + FDIT+VSGDKDLIQLTD +TWEISKKGVAEFE FTP YLME+MG+TP+ 
Sbjct: 122 LDKLAE - - QDGFD ITI VSGDKDL IQLTDEHTWE I SKKGVAEFEAFTPDYLMEEMGLTPA 179 

Query: 183 QFIDLKALMGDKSDNIPGVTKIGEKTGLKLLSEYGSLEGIYENIEAMKQSKMKENLINDK 242 
QFIDLKALMGDKSDNIPGVTK+GEKTG+KLL E+GSLEGI YENI + MK SKMKENL1NDK 



Query. 243 EQAFLSKTLATINIASPITIGLEDILYSGPQDIKALSQFYDEMDFKQFKAALGEETSQED 302 

EQAFLSKTLATI+ +PI IGBED++YSGP D++ L +FYDEM FKQ K AL ++ 
Sbjct: 240 EQAFLSKTLATIDTKAPIAIGIiEDLVYSGP-DVENLGKFYDEMGFKQLKQALNMSSADVA 298 

Query: 303 FEVDFTEVEQLKTEMFSDNDFYYFEMLGDNYHVEDLIGIAWGNSDTIYATSNVSLLQEAL 362 

+DFT V+Q+ +M S+ ++FE+ G+NYH ++L+G AW D +YAT + LLQ+ + 
Sbjct: 299 EGLDETIVDQISQDMLSEESIFHFELFGENYHTDNLVGFAWSCGDQLYATDKLELLQDPI 358 

Query: 363 FKKALSKP-IKTYDFKRSKVLLNRFNIDLPEPAFDTRLAKYLLSTTEDNLVSTIARLYTN 421 

FK L K ++ YDFK+ KVLL RF +DL PAFD RLAKYLLST EDN ++TIA LY 
Sbjct: 359 FKDFLEKTSLRVYDFKKVKVLLQRFGVDLQAPAFDIRLAICYLLSTVEDNEIATIASLYGQ 418 

Query: 422 LPLDTDDAWGKGAKRAIPEKTRFLEHIAKCT^TOSFJWIMQQLKANEQEELLFEMEQ 481 

L D+ YGKG K+AIPE+ +FLEHLA K+ VLV++E ++++L N Q ELL++MEQ 
Sbjct: 419 TYLVDDETFYGKGVKKAIPEREKFLEHLACKLAVLVETEPILLEKLSENGQLELLYDMEQ 478 

Query: 482 PLANVIAKMEIRGIKVKKNTIMMAIENQKVIETLTQEIYEIAGQEraiNSPKQLGK^ 541 

P1A VLAKMEI GI VKK TL EM EN+ VIE LTQEIYELAG+EFN+NSPKQLG LLF 
Sbjct: 479 P1AFVIAKMEIAGIVVKKETLDEMQAENELVIEKLTQEIYELAGEEFNVNSPKQLGVLLF 538 

Query: 542 ETLGLPVEMTKKTKTGYSTAVDVLERLAPISPLVTKILEYRQITKLQSTYIIGLQDYILE 601 
E LGLP+E TKKTKTGYSTAVDVLERLAPI+P+V KIL+YRQI K+QSTY+IGLQD+IL 

Query: 602 DGKIHTRYVQDLTQTGRLSSSDPNLQNIPVRLEQGRLIRKAFVPSEDNAVLLSSDYSQIE 661 

DGKIHTRYVQDLTQTGRLSS DPNLQNIP RLEQGRLIRKAFVP +++VLLSSDYSQIE 
Sbjct: 599 DGKIHTRYVQDLTQTGRLSSVDPNLQNIPARLEQGRLIRKAFVPEWEDSVLLSSDYSQIE 658 

Query: 662 LRVIAHISKDEHDIAAFKEGADIHTSTAmVFGIEKPENVTPNDRRNAKAVNFGIVYGIS 721 

LRVLAHI SKDEHLI AF+EGADIHTSTAMEVFGIE+P+NVT NDRKNAKAVNFG+VYGIS 
Sbjct: 659 LRVIAHISKDEHLIKAFQEGADIHTSTA^mVFGIERPDNVTANDRRNAKAVNFGVVYGIS 718 

Query: 722 DFGLSHNLGIPRKIAKQYIDTYFERYEGIKNYI^STVVREAKDKGYVETLFHRRRSLPDIN 781 

DFGLS+NLGI RK AK YIDTYFER+PGIKNYM+ WREA+DKGYVETLF RRR LPDIN 
Sbjct: 719 DFGLSNNLGISRKEAKAYIDTYFERFPGIKNYMDEVVREARDKGYVETLFKRRRELPDIN 778 

Query: 782 SRNFNIRQFAERTAINSPIQGSAADILKIAMIl^RVnDKGGYKSKMLLQvHDEIVLEVP 841 

SRNFNIR FAE TAINSPIQGSAADILKIAMI LD+ L GGY++KMLLQVHDEIVLEVP 
Sbjct: 779 SRNFNIRGFAEATAINS P I QGSAADILKIAMIQLDKALVAGGYQTKMLLQVHDE I VLEVP 838 
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Query: 842 NEEIGAIRELOTKTMESAISLSVPLIADENAGETWYEAK 880 

E+ +++LV +TME AI LSVPLIADEN G TWYEAK 
Sbjct: 839 KSELVEMKKLVKQTMEEAIQLSVPLIADENEGATWYEAK 877 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3521> which encodes the amino acid 
sequence <SEQ ID 3522>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 7 - 23 ( 7 - 23) 

Final Results 

bacterial membrane — Certainty=0 .1171 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 665/881 (75%) , Positives = 761/881 (85%) , Gaps = 2/881 (0%) 





1 


Sb j Ct : 


1 


Query: 


61 


Sbj Ct : 


61 


Query: 


121 


Sbj ct : 


121 


Query: 


181 


Sbj ct : 


181 




241 


Sbjct: 


241 




301 


Sbjct: 


300 




360 


Sbjct: 


360 


Query: 


420 


Sbj ct : 


420 




480 


Sbjct: 


480 




540 


Sbjct: 


540 


Query: 


600 


Sbjct: 


600 




660 



LVAFDAGKTTFRTEMYADYK GRAKTP+EFREQFPYIR+ L LSI +YELEHYEADDII 



FD+T-l-VSGDKDLIQLTD NTWE I SKKGVAEFEEFTPAYLMEKMG-i-T 



P+QFIDLKALMGDKSDNIPGVTKIGEKTGLKLL E+GSLEGIYE+I+ K 



D++QAFLSKTLATIN ASPITIGL+DI+Y+GP D+ +LSQFYDEMDF Q K L 



++FS D +YFE L DKYH E +IG AWG+ + IYA++++ LL 



KPI TYDFKRSKVLL+ I+L P++D RLA YLLST EDN +STIAR++ 



T++ L+ DD VYGKGAKRA+P+K LEHLA+KVKVL+DS+4 



E PLANVLAKMEI GIKV + TL +MA +N+ +IE LTQEIY++AGQEFNINSPKQLG 4 



LFE + LP+EMTKKTKTGYSTAV+VLERLAPI+P+V KIL+YRQITKLQSTY+IGLQDYI 



L DGKIHTRYVQDLTQTGRLSS DPNLQNIP+RLEQGRLIRKAF PS ++AVLLSSDYSQ 



IELRVLAHIS DEHLIAAF EGADIHTSTAMRVFGI + - +VT NDRRNAKAVNFGIVYG 



WO 02/34771 



-1270- 



PCT/GB01/04789 



5 



10 



Sbjct 



660 IBLRVIAHISGDEHLIAAFNEGADIHTSTAI'IRVFaiDRAADVTANDRRNAKAVNFGIVYG 719 

720 ISDFGLSHNLGIPRKLAKQYIDTYFERYPGIKNYMETVVREAKDKGYVETLFHRRRSLPD 779 

ISDFGLS+NLGI RK AK YIDTYFERYPGIK YME VWEAKDKGYVETLF RRR LPD 
720 ISDFGLSNNLGITRKQAKSYIDTYFERYPGIKAYMENWREAKDKGYVETLFKRRRELPD 779 

780 INSRKFNIRQFAERTAINSPIQGSAADILKIAMINLDRVLDKGGYKSKMLLQVHDEIVLE 839 

INSRNFN+R FAERTAINSPIQGSAADILKIAMINLD+ L GG+++KMLLQVHDEIVLE 
780 INSRNFNTOSFAERTAINSPIQGSAADILKIAMI>ILDKALQAGGFRAKMLLQVHDEIVIjE 839 

840 VPNEEIGAIRELVTKTMESAISLSVPLIADENAGETWYEAK 880 

VPN+E+ AI++LV TME+A+ L+VPL DE+ G +WYEAK 
840 VPNDELTAIKKLVKDTMEAAVDLAVPLCVDESTGHSWYEAK 880 



15 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1135 

A DNA sequence (GBSxl211) was identified in S.agalactiae <SEQ ID 3523> which encodes the amino 
acid sequence <SEQ ID 3524>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1880 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9571> which encodes amino acid sequence <SEQ ID 9572> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 17 NPSDFMLKNYLTKAKTIAWGLSDRQETAAYQVSKIMQEAGYQIIPVNPKNAGQKILGQM 76 

NPSD +K L +AK IAWGLS + +Y VS MQ AGY+IIPVNP ++LG+ 
Sbjct: 4 NPSDEKIKQILQEAKKIAWGLSGNPDRTSYWSAAMQHAGYEIIPWP--TVDEVLGEK 61 

Query: 77 TYASLKDVTEHIDIVNIFRRSEYLPDIAREFLEVDADIFWAQLGLESQEAETILKQAGHK 136 

SL+D+ +DIVN+FRRSE+LPD+ARE +E+ A +FWAQLGLE++EA L+Q G 
Sbjct: 62 AVPSLQDIEGAVDIVNVFRRSEHLPDVARETVEIGAPVFWAQLGIjENKEAYDYLQQHGVT 121 

Query: 137 QIVMNKCLKVECQK 150 

I MN+C+KVE K 
Sbjct: 122 SI-MNRCIKVEHAK 134 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3525> which encodes the amino acid 
sequence <SEQ ID 3526>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0837 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 87/141 (61%) , Positives = 114/141 (80%) 
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Query: 11 MVYHFQNPSDFMLKNYLTKAKTIAWGLSDRQETAAYQVSKIMQEAGYQIIPVNPKNAGQ 70 

++Y FQNPS+ +LK YL AKTIAWGLSDR++TAAY V+K MQ Y+IIPVNPK AGQ 
Sbjct: 1 VIYSFQNPSEDVLKAYLESAKTIAWGLSDRKDTAAYGVAKFMQAMDYRIIPVNPKLAGQ 60 

Query: 71 KILGQMTYASLKDVTEHIDIVNIFRRSEYIjPDIAREFLEVDADIFWAQLGLESQEAETIL 130 

ILG+ YAS+K + +DIV++FRRSE+LP++AR+FL A +FWAQLGLE+QEA+TIL 
Sbjct: 61 LILGEKVYASIKAIPFEVDIVDVFRRSEFLPEVARDFIiAGQAKVFWAQLGLENQEAQTIL 120 

Query: 131 KQAGHKQIVMNKCLKVECQKL 151 

+ AG + IVMN+CLK++ +L 
Sbjct: 121 RSAGKEAIVMNRCLKIDYLQL 141 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1136 

A DNA sequence (GBSxl212) was identified in S.agalactiae <SEQ ID 3527> which encodes the amino 
acid sequence <SEQ ID 3528>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3367 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9573> which encodes amino acid sequence <SEQ ID 9574> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3529> which encodes the amino acid 
sequence <SEQ ID 3530>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty= 0.4 960 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 113/151 (74%), Positives = 133/151 (87%), Gaps = 1/151 (0%) 

Query: 7 MDSHSHGHRPLDAYENVLEHLREKRIRITETRKAlISYIWNSREHPSAEKIYNDLIjPEYP 66 

MD HSH + LDAYENVLEHLREK IRITETRKAIISYM+ S EHPSA+KIY DL P +P 
Sbjct: 1 MDIHSH-QQALDAYENVLEHLREKHIRITETRKAi:SYMIQSTEHPSADKIYRDLQPNFP 59 

Query: 67 NMSIATVYJTOLKVLVDEGFOTELKLCN^ 126 

NMS»TVYNNLKVLVDEGFV+ELK+ N TTYYD FMGHQH+N+ CE CGKI DF+DVD++ 
Sbjct: 60 NMStATVYNNLKVLVDEGFVSELKISNDLTTYYD^ 119 

Query: 127 DISREAHQQTGFEVTRVQLVAYGICPECQRK 157 

DI++EAH+QTG++VTR+ ++AYGICP+CQ K 
Sbjct: 120 DIAKEAHEQTGYKVTRIPVIAYGICPDCQAK 150 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1137 

A DNA sequence (GBSxl213) was identified in S.agalactiae <SEQ ID 3531> which encodes the amino 
acid sequence <SEQ ID 3532>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
5 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -2.13 Transmembrane 16 - 32 ( 14 - 32) 
INTEGRAL Likelihood = -1.81 Transmembrane 496 - 512 ( 496 - 515) 

Final Results 

10 bacterial membrane Certainty=0 . 1850 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA06650 GB:AJ005645 sdrc [Staphyl< 
Identities = 41/146 (28%) , Positives 

Query: 4 SQYNKWSIRRLKVGAASVMIASGSTVALGQSHIVSAD EMSQPKTTITAPTANTSTN 59 

++ NK+SIR+ VG AS+++ + I L +A+ E++Q K TAP+ N +T 

Sbjct: 16 NRLNKFSIRKYSVGTASILVGTTLIFGLSGHEAKAAEHTNGELNQSKNETTAPSENKTT- 74 



Query: SO VESSTDKALSKOTTMETSSEMPK--MQNMAKVEKT3DKPMMVATSVRKMMATPTPVAMT- 116 

D K T +++ PK M + A V++TS + T T T 

Sbjct: 75 --KKOTSRQLKDNTQTATADQPKVTMSDSATVKETSSNMQSPQNATANQSTTKTSNVTTN 132 

Query: 117 KTTSVDEVKKSTDTAFKQTVDVP 139 

TT +E KS T K P 
Sbjct: 133 DKSSTTYSNETDKSNLTQAKDVSTTP 158 



30 No corresponding DNA sequence was identified in S. pyogenes. 

A related GBS gene <SEQ ID 8735> and protein <SEQ ID 8736> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: -0.92 
35 GvH: Signal Score (-7.5): -2.48 

Possible site: 39 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 2 value: -2.13 threshold: 0.0 

INTEGRAL Likelihood = -2.13 Transmembrane 16 - 32 ( 14 - 32) 
40 INTEGRAL Likelihood = -1.81 Transmembrane 496 - 512 ( 496 - 515) 

PERIPHERAL Likelihood = 7.96 402 
modified ALOM score: 0.93 



* Reasoning step: 3 



bacterial outside 
bacterial cytoplasm 



— Certainty=0. 0000 (Not Clear) . 

— Certainty=0. 0000 (Not Clear) < 



LPXTG motif: 485-489 



The protein has homology with the following sequences in the databases: 

D| 5981 | 5780 leukotoxin > Insert characterized 
55 SP|P16462]HLYA_ACTAC LEUKOTOXIN. > Edit characterized 

GP|l41834|gb|AAA21922.l| |M27399 leukotoxin (LtA) {Actinobacillus actinomycetemcomitans} 
Insert characterized 



Query: 210 VSLNGNTTGKEGQALLDQI | AND KHSYQATIRVYGAKDGKVDLKNMISPKMVTINIP 266 

60 ++ NG+ + G+A +D +K + KHS + T ++ G +DL + +T P 
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Sbjct: 


488 


Query: 


267 


Sbjct: 


547 




323 


Sbjct: 


607 


Query: 


377 


Sbjct: 


666 



RLILKASEGAKWSDNGVDKNSPLL PLKDLTKGKYFYQVSLNGNTAGKKGQALLD 376 

L A+ GAK V S 4-4- 4 D 4-KG4- ++4-+G A K GQ 4-4- 

EARLIANLGAKDDYVFVGSGSTIVNAGEGYD\rVDYSKGRTG-ALTIDGRNATKAGQYKVE 665 

QIKANGSHTYQATITIYGTKDGKV 400 
4- +G+ Q T+-4- TK GKV 
R- DLSGTQVLQETVSKQETKRGKV 688 

SEQ ID 3532 (GBS1) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 (lane 3; MW 78kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 2 (lane 3; MW 53kDa). 

The His-fusion protein was purified as shown in Figure 189, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1138 

A DNA sequence (GBSxl214) was identified in S.agalactiae <SEQ ID 3533> which encodes the amino 
acid sequence <SEQ ID 3534>. This protein is predicted to be response regulator (regX3). Analysis of this 
protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 3585 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB54578 GB:AJ006397 response regulator [Streptococcus pneumoniae] 
Identities = 143/228 (62%) , Positives = 183/228 (79%) , Gaps = 1/228 (0%) 

Query: 1 MTQKLLLVDDEFEIIDINRRYLEQAGYEVSVAADGIEALKEVDENRFDLIISDIMMPKMD 60 

M 4- 4-LLVDDE EI DI++RYL QAGY4-V VA DG+EAL4- + DLII4-D4-MMP4-MD 
Sbjct: 1 MGKTILLVDDEVEITDIHQRYLIQAGYQVLVAHDGLEALELFKKKPIDLIITDVMMPRMD 60 

Query: 61 GYDFISEVLVREPNQPFLFITAKVS3PDKIYSLSMGADDFISKPFSPRELVLRVKNILRR 120 

GYD ISEV P QPFLFITAK SE DKIY LS4-GADDFI4-KPFSPRELVLRV NILRR 
Sbjct: 61 GYDLISEVQYLSPEQPFLFITAKTSEQDKIYGLSLGADDFIAKPFSPRELVLRVHNILRR 120 

Query: 121 IYGJSMQQSEVLTIGDLVIDQKQRLvMVDCOTISLTNKBFDLLWILANHLNRVFSKTELYE 180 

4-4- 4-4-E4-4-4-4-G4-L 4-4- V 4- 4- LT KSF4-LLWILA+4- RVFSKT+LYE 

Sbjct: 121 LH-RGGETELISLGNLKMNHSSHEVQIGEEMLDLTVKSFSLLWILASNPERVFSKTDLYE 179 

Query: 181 RWGEEFLDDTlWIjNVHIHALRiroLAKFSTDNTPTIKTVWGLGYKLEE 228 

4-4-W E4-4-4-DDTNTLNVHIHALR 4-LAK+S+D TPTIKTVWGLGYK+E4- 
Sbjct: 180 KIWKEDYVDDTNTLNVHIHALRQELAKYSSDQTFTIIcrvWGLGYKIEK 227 

There is also homology to SEQ ID 1 182. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1139 

A DNA sequence (GBSxl215) was identified in S.agalactiae <SEQ ID 3535> which encodes the ami 
acid sequence <SEQ ID 3536>. This protein is predicted to be histidine kinase (resE). Analysis of tl 
protein sequence reveals the following: 

Possible site: 25 

>:» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.13 Transmembrane 42 - 58 ( 33 - 65) 
INTEGRAL Likelihood = -7.54 Transmembrane 7 - 23 ( 3 - 29) 

Final Results 

bacterial membrane Certainty=0 .4652 (Affirmative) < suco 

bacterial outside Certainty=0.0000(Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



MKLKYYIVIGyLISMLITVAGVFFGLNHMLIETRGVYYILSVTIIACIVGGIvNLFLLSS 60 
MKLK YI++GY+IS L+T+ VF+ + MLI +Y++L +TI+A +VG ++LFLL 
MKLKSYILVGYIISTLLTILWFWAVQKMLIAKGEIYFLLGMTIVASLVGAGISLFLLLP 60 



VFTSL KLK+ K ++ + F + ++ P EF+ L FN+MS +L+ +F SL ESEREK 





1 


Sb j ct : 


1 




61 


Sbjct: 


61 


Query: 


121 


Sbjct: 




Query: 


181 


Sbjct: 


181 




241 


Sb j ct : 


241 


Query: 


301 


Sb j ct : 


301 



+++I+LDKLLI+ +SEFQ -t 



\ KYS PG+ L + A + + I + D+G GI EDL +IF RLYRVE+3R 



NMKTGGHGLGL IAR+LAHQL G+I V SQY GS F+LVL L 
NMKTGGHGLGLAIARELAHQLGGEITVSSQYGLGSTFTLVLNL 343 

There is also homology to SEQ ID 1 178. 

A related GBS gene <SEQ ID 8737> and protein <SEQ ID 8738> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 

McG: Discrim Score: 8.67 

GvH: Signal Score (-7.5): -5.75 
Possible site: 25 

>» Seems to have an uncleavable N-term signal seq 

ALOM program count: 2 value: -9.13 threshold: 0.0 

INTEGRAL Likelihood = -9.13 Transmembrane 42 - 58 ( 33 - 65) 
INTEGRAL Likelihood = -7.54 Transmembrane 7 - 23 ( 3-29) 
PERIPHERAL Likelihood = 3.92 196 
modified ALOM score: 2.33 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 .4652 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

55.3/72.7% over 343aa 

5 Streptococcus 
pneumoniae 

GP| 5830539 | histidine kinase Insert characterized 



ORF00129(301 - 1332 of 1635) 
10 GP|5830539|emb|CAB54579.l| |AJ006397(1 - 344 of 350) histidine kinase {Streptococcus 

pneumoniae } 
%Match =34.0 

%Identity = 55.2 %Similarity =72.7 

Matches = 190 Mismatches = 94 Conservative Sub.s = 60 

15 

42 72 102 132 162 192 222 252 

VIWLSTKNNW*WTAIQFP*PINHLTCFGY*QI I * IVFFQKQSFMNVSGAKNF*MTLIL*MFISMPYAMTLLNLVQTIP 



282 312 342 372 402 432 462 492 

20 QLSKQFGD*GIN*PJSIKMKIiKYYIVIGYLISMLITOAGVFFG 

llll ll==ll=ll |:|: 11= : III = l = = l =11 = 1 =11 ::|||| llll 
MKLKSYILVGYIISTLLTILWFWAVQKMLIAKGEIYFLLGMTIVASLVGAGISLFLLLPVFTS 
10 20 30 40 50 60 

25 522 552 582 612 642 672 702 732 

LKKLKQKMKDISQRCFDTKAQICSPQEFKDLETAFNQMSSELESTFKSLNESEREKTMMIAQLSHDIKTPITSIQSTVEG 

i ni= i == = i = == i n= i n = ii =i= i ii linn =iinnninnini = iin 

LGKLKEHAKRVAAKDFPSI^EVQGPvEFQQLGQTFNEMSHDLQVSFDSLEESEREKGLMIAQLSHDIKTPITSIQATVEG 
80 90 100 110 120 130 140 

30 

762 792 822 852 882 912 942 972 

IBDGIISEEEVNYYLNTISRQTNRLNHLVEELSFITLETMSDTAEPHKEETIYLDKLLIDILSEFQLVFEKENRQVMIDV 
llllll I I =11 II III III 11111=1=11 I = I :::|=||||||= =1111== 1=1 I I = I 
ILDGIIKESEQAHYlATIGRQTERIiNKLVEELNFLTLKTARNQvETTSKDSIFLDKLLIECMSEFQFLIEQERRDVHLQV 
35 160 170 180 190 200 210 220 



1002 1032 1062 1092 1122 1152 1182 1212 

APDVSKLSSQYDKLSRILL^LISNAXKYSDPGSPLTIKAYSNRQDIVIDIIDQGYGIKDEDIASIFMRLYRVESSRNMKT 
1= = = = I 111111 = 11= II III 11= I = I : = I = 1 = 1 II III =11 lllllhllllll 



1242 1272 1302 1332 1362 1392 1422 1452 

GGHGLGLYIARQIAHQLNGDILVESQYQKGSKFSLVLKLQK*LGIIPSYFL*CFYKRLSAQ*FGKEGDRYRI)IRN*RL*G 

45 IIMII I 111 = 11111 1 = 1 Mil II 1 = 111 I 

GGHGLGLAIARELAHQLGGEITVSSQYGLGSTFTLVLIJLSGSENICA 
320 330 340 350 



SEQ ID 8738 (GBS28) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
50 extract is shown in Figure 14 (lane 3; MW 64kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 5; MW 38.8kDa) and in Figure 157 
(lane 9-11; MW39kDa). 

GBS28-His was purified as shown in Figure 221, lane 6-7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
55 vaccines or diagnostics. 

Example 1140 

A DNA sequence (GBSxl216) was identified in S.agalactiae <SEQ ID 3537> which encodes the amino 
acid sequence <SEQ ID 3538>. Analysis of this protein sequence reveals the following: 
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Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.70 Transmembrane 

INTEGRAL Likelihood = -7.59 Transmembrane 

INTEGRAL Likelihood = -S.48 Transmembrane 

INTEGRAL Likelihood = -5.57 Transmembrane 

INTEGRAL Likelihood = -1.33 Transmembrane 



Final Results 

10 bacterial membrane Certainty=0 .4079 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9575> which encodes amino acid sequence <SEQ ID 9576> 
15 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA79984 GB:Z21972 ORFl [Bacillus megaterium] 
Identities = 35/119 (29%) , Positives = 62/119 (51%) , Gaps = 15/119 (12%) 

20 Query: 142 SSFRLLLSGNLILAPVLIWSSLITTKAVIKLV QQYYSYSISTLVFYTQLESGNYEG 198 

+SF+L+ +++ A + + S L+ +IK + QQ++ + YT LE+ 
Sbjct: 105 TSFKLI-GASILQAIFIFLWSLLLIIPGIIKAIAYSQQFFL--LKDHPEYTVLEA 156 

Query: 199 PSKVLVASRELMNGNKLRLFLLDLSFIGWQFLTIFSFGLVYIYLLPYQTTARLIFYRNI 257 
25 + S++ M G K + FL+ LSFIGW L +F+ G+ ++L+PY T FY + 

Sbjct: 157 ITESKKRMKGLKWKYFLMHLSFIGWGILCMFTLGIGLLWLIPYAGTTTAAFYEEL 211 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3539> which encodes the amino acid 
sequence <SEQ ID 3540>. Analysis of this protein sequence reveals the following: 



Possible site: 54 
■> Seems to have an uncleavable N-t 
INTEGRAL Likelihood =-10 
INTEGRAL Likelihood = -8 
Likelihood = -6 
Likelihood = -3 
Likelihood = -2 



Transmembrane 



Transmembrane 



14B - 164 ( 143 - 170) 

114 - 130 ( 101 - 141 ; 

60 - 76 ( 49 - 

21 - 37 ( 21 - 

222 - 238 ( 221 - 



Final Results 

bacterial membrane Certainty=0 .5034 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following 

>GP:CAA79984 GB:Z21972 ORFl [Bacillus megaterium] 
Identities = 63/220 (28%) , Positives = 100/220 (44%) , 



62 LGLILSLFILSASFTMI-DWRHFRQK/SFAESTTAFSKZFFGNLLVLAITKWLFFLIWS 120 

+ L+L LF+++ F +1 +V+ + T + F + +A+ L S 

22 VSLMLLLFLINLVFPLIVEVIGSGGFSEWLKQEETPLWSDIFSMVFSIALIP LTIS 77 

121 LIWFF GLFIFLSGLSAFLVNAKSGSSTVISLIFLLFGAVLSLIGFGI 167 

WF+ 1+ G ++F + G+S +■ ++ L+ +L + G 
78 TTWFYLNLVREGNPGIPEVFAIYKDGKTSFKL IGASILQAIFIFLWSLLLIIPG 131 

168 YINRYYAYSLSEYLLYDEVKEGTYIX3aiAVIETSVAMMKGYKWKLFFLQLSFTGWFLLNI 227 

I + AYS +LL D E T L AI S MKG KWK F + LSF GW +L + 

132 -IIKAIAYSQQFFLLKDH-PEYTVLEAIT ESKKRMKGLKWKYFLMHLSFIGWGILCM 185 

228 VTFGLLNIYLLPYFTTANVIFYDQLKKRFKDKDD- -PIEG 265 

T G+ ++L+PY T FY++L +D DD IEG 
187 FTLGIGLLWLIPYAGTTTAAFYEELIVPQEDIDDDQQIEG 226 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 87/254 (34%) , Positives = 137/254 (53%) , Gaps = 10/254 (3%) 

Query: 16 MTNSEIKNEAKTILSNLQGKNQLFLLPILLSIITLYISFYYQYN NMTLLDFFVPL 70 

5 M+ IK +A+ L NIi GK LFL+P LI> + I + Y ++L + PL 

Sbjct: 1 MSIKAIKGQARDTLKNLSGKYLLFLIPTLLFMFHFGIEIHQGYVLSSGIEVSLAASYFPL 60 

Query: 71 PWFFYTLFIISVSFVMLDWKNQKLl^FSDNTYVFSSHIFWKLLSVLVLKGLILSFFY 130 
+ +LFI+S SF M+DW++ + V F+++T FS F LL + + K L + 
10 Sbjct: 61 LLGLILSLFILSASFTMIDWRHFRQKVSFASSTTAFSIffiFFGNLLvIAITKMjFFLIWS 120 

Query: 131 LLSTFGLLI I ISSFRLLL SGNLILAPVLIWSSLITTKAVIKLVQQYYSYSISTL 185 

L+ FGL I +S L + +++ + ++ ++++ + +YY+YS+S 

Sbjct: 121 LIWFFGLFIFLSGLSAFLWAKSGSSTVISLIFLLFGAVLSLIGFGIYINRYYAYSLSEY 180 

15 

Query: 
Sbjct: 

20 Query: 246 QTTARLIFYRNITK 259 

TTA +IFY 4 K 
Sbjct: 241 FTTANVIFYDQLKK 254 



A related GBS gene <SEQ ID 8739> a 
protein sequence reveals the following: 



i protein <SEQ ID 8740> were also identified. Analysis of tl 



Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: -11.32 
GvH: Signal Score (-7.5): -5.39 

Possible site: 19 
>>> Seems to have no N-terminal signal sequence 



ALOM 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = -7 
Likelihood = -7 
Likelihood = -6 
Likelihood = -5 
Likelihood = -1 



-7.71 
70 



125 - 141 



threshold: 
Transmembrane 
Transmembrane 38 - 54 
Transmembrane 146 - 152 
Transmembrane 72 - 88 
Transmembrane 229 - 245 
105 



143 - 174! 



227 - 245; 



* Reasoning Step: 3 

--- Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 4079 (Affirmative) < succ; 
-- Certainty=0. 0000 (Not Clear) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF00498(901 - 1071 of 1383) 

EGAD| 19922 1 20421 (155 - 211 of 226) hypothetical protein {Bacillus megaterium} 
50 GP|288299|emb|CAA79984.l| |Z21972 ORF1 {Bacillus megaterium} PIR| S32215 1 S32215 hypothetical 

protein 1 - Bacillus megaterium 
%Match =4.8 

%Identity =36.8 %Similarity =61.4 

Matches = 21 Mismatches = 22 Conservative Sub.s = 14 

55 

741 771 801 831 861 891 921 951 

LIIISSFRLLLSGNLILAPVLIWSSLITTKAVIKLVQQYYSYSISTLVFYTQLESGNYEGPSKVLVASr^LMNGNKLRL 

: : |:: I I I : 

GIPEVFAIYKDGKTSFKLIGASILQAIFIFLWSLLLIIPGIIKAIAYSQQFFLLKDHPEYTVLEAITESKKRMKGLKWKY 
60 110 120 130 140 150 160 170 



981 1011 1041 1071 1101 

FLLDLSFIGWQFLTIFSFGLVYIYLLPYQTTARLIFYRNITKNS*E 4 

11= llllll I .|.«h -1 = 11 I II « 
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FLMHLSFIGWGILCMFTLGIGLLWLIPYAGTTTAaFYEELIVPQEDIDDDQQIEG 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1141 

A DNA sequence (GBSxl217) was identified in S.agalactiae <SEQ ID 3541> which encodes the amino 

acid sequence <SEQ ID 3542>. This protein is predicted to be tRNA-guanine transglycosylase (tgt). 

Analysis of this protein sequence reveals the following: 

10 Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .3706 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9577> which encodes amino acid sequence <SEQ ID 9578> 
was also identified. 

20 The protein has homology with the following sequences in the GENPEPT database. 



Query: 12 MTDHPIKYRLIKQEKHTGARLGEIITPHGTFPTPMFMPVGTQATVKTQSPEELKKMGSGI 71 

M + PI+Y IK+ K TGARLG++ TPHG+F TP+FMPVGT ATVKT SPEELK M +GI 
Sbjct: 1 MAEQPIRYEFIKECKQTGARLGKVHTPHGSFETPVFMPVGTIiATVKTMSPEELKAMDAGI 60 

ILSNTYHLWLRPG +++ +AGGLHKFMNWD+AILTDSGGFQV+SL+ RNI EEGV F+N 
Sbjct: 61 ILSNTYHLWLRPGQDIVKEAGGLiHKFMNWDRAILTDSGGFQVFSLSKFRNIEEEGVHFRN 120 

Query: 132 HLNGAKMFLSPEKAISIQNNLGSDIMMSFDECPQFYQPYDYVKKSIERTSRWAERGLNAH 191 

HLNG K+FLSPEKA+ IQN LGSDIMM+FDECP + YDY+K+S+ERTSRWAER IiNAH 
Sbjct: 121 HMGDKLFLSPEKAMEIQNALGSDIMMftFDECPPYPAEYDYMKRSVERTSRWAERCLNAH 180 

Query: 192 RRPHDQGLFGIVQGAGFEDLRRQSARDLVSMDFPGYS-GGLAVGETHDEMNAVLDFTVPM 251 

R +QGLFGIVQG +EDLR QSA+DL+S+DFPGY+IGGL+VGE D MN VL+FT P+ 
Sbjct: 181 NRQDEQGLFGIVQGGEYEDLRTQSAKDLISLDFPGYAIGGLSVGEPKDVMNRVLEFTTPL 240 



Sbjct: 241 LPKDKPRYLMGVGSPDALIDGAIRGTOKFDCTLETRIARNGTVFTAEGRLNMKNAKFERD 300 

Query: 312 FTPLDPNCDCYTCKNYTPAYIRHLLKADETFGIRLTSYHNIiYFLVNLMKDVRQAIMDDNIj 371 

F P+D CDCYTCKNYTRAYIRHL++ +ETFG+RLT+YHNL+FL++LM+ VRQAI +D L 
Sbjct: 301 FRPIDEECDCYTCKNYTRAYIRHLIRCNETFGLRLTTYHNLHFLLHLMEQVRQAIREDRL 360 

Query: 372 LEFRQDFMERYGYGMNN 388 

+FR++F ERYGY N 
Sbjct: 361 GDFREEFFERYGYNKPN 377 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3543> which encodes the amino acid 
sequence <SEQ ID 3544>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 2590 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 351/380 (92%) , Positives = 368/380 (96%) 

Query: 12 MTDHPIKYRLIKQEKHTGARI^EIITPHGTFPTPMFMPVGTQATVKTQSPEELKEMGSGI 71 

MTD+PIKYRLIK EKHTGARLGEIITPHGTFPTPMFMPVGTQATVKTQSPEELK +GSGI 
Sbjct: 1 MTDYP I KYRLIKAEKHTGARLGEI ITPHGTFPTPMFMPVGTQATVKTQS PEELKAI GSGI 60 

Query: 72 ILSNTYHLWLRPGDELIAKAGGLHKFMNWDQAILTDSGGFQVYSLADSRNITEEGVTFKN 131 

ILSNTYHLWLRPGDELIA++GGLHKFMNWDQ ILTDSGGFQVYSLADSRNITEEGVTFKN 
Sbjct: 61 ILSNTYHLWLRPGDELIARSGGLHKFMNWDQPILTDSGGFQVYSLADSRNITEEGVTFKN 120 

Query: 132 HLNGAKMFLSPEKAISIQNNLGSDIMMSFDECPQFYQPYDYVKKSIERTSRWAERGLNAH 191 

HLNG+KMFLSPEKAISIQNNLGSDIMMSFDECPQFYQPYDYVKKSIERTSRWAERGL AH 
Sbjct: 121 HLNGSKMFLSPEKAISIQNNLGSDIMMSFDECPQFYQPYDYVKKSIERTSRWAERGLKAH 180 

Query: 192 RRPHDQGLFGIVQGAGFEDLRRQSARDLVSMDFPGYSIGGLAVGETHDEMNAVLDFTVPM 251 

RRPHDQGLFGIVQGAGFEDLRRQSA DLV+MDFPGYS IGGLAVGE+H+EMNAVLDFT P+ 
Sbjct: 181 RRPHDQGLFGIVQGAGFEDLRRQSAADLVAMDFPGYSIGGIiAVGESHEEMNAVLDFTTPL 240 



Query: 252 LPNDKPRYLMGVGAPDSLIDAVIRGVDMFDCVLPTRIARNGTCMTSQGRLWKNAKFAED 311 
LP +KPRYLMGVGAPDSLID VIRGVDMFDCVLPTRIARNGTCMTS+GRLV+KNAKFAED 
25 . Sbjct: 241 LPENKPRYLMGVGAPDSLIDGVIRGVDMFDCVLPTRIARNGrCMTSEGRLVIKNAKFAED 3 00 

Query: 312 FTPLDPNCDCYTCKNYTRAYIRHLLKADETFGIRLTSYHNLYFLVNLMKDVRQAIMDDNL 371 

FTPLD +CDCYTC+NY+RAYIRHLIJKADETFGIRLTSYHNLYFDVNLMK VRQAIMDDNL 
Sbjct: 301 FTPIJJHDCDCYTCCjNYSRAyiRHLIjKADETFGIRLTSYHNLYFLVNLMKXVRQAIMDDNL 360 

30 

Query: 372 LEFRQDFMERYGYGMNNRNF 391 

LEFRQDF+ERYGY +NRNF 
Sbjct: 361 LEFRQDFLERYGYNKSNRNF 380 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1142 

A DNA sequence (GBSxl218) was identified in S. agalactia e <SEQ ID 3545> which encodes the amino 
acid sequence <SEQ ID 3546>. Analysis of this protein sequence reveals the following: 

40 Possible site: 56 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .2479 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9303> which encodes amino acid sequence <SEQ ID 9304> 
was also identified. A further related GBS nucleic acid sequence <SEQ ID 10795> which encodes amino 
50 acid sequence <SEQ ID 10796> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB16256 GB:Z99164 hypothetical protein [Schizosaccharomyces 
pombe] 

Identities = 42/91 (46%) , Positives = 62/91 (67%) , Gaps = 3/91 (3%) 

55 

Query: 6 FGIGLDSSSRCYHYHTKLDIVALKCAVCQKYYACYKCHDALEEHCFAA-TKSDETFP- VL 63 

+G +D+ +RC+HYH+K D+VAL+C C+K+YAC++CHD L H F K+ P V+ 
Sbjct: 13 YGKLVDNETRCFHYHSKADWALRCGQCEKFYACFQCHDELNTHPFLPWRKAKFHIPCVI 72 
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Query: 64 CGSCRQMLTLKEYK- TGFCPYCRMLFNPNCQ 93 

CG+C+ LT++EY+ T C YC FNP C+ 
Sbjct: 73 CGACKNSLTVEEYRSTVHCKYCNHPFNPKCK 103 

5 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3547> which encodes the amino acid 
sequence <SEQ ID 3548>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0 .2769 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 55/93 (59%) , Positives = 62/93 (66%) 

Query: 2 MQEYFGIGLDSSSRCYHYHTKLDIVALKCAVCQICYYACYKCHDALEEHCFARTKSDETFP 61 
20 M + FGI LD RC HYHT LDIV LKCA CQ YYACY CHD L +H F T ET P 

Sbjct: 1 MTDCFGIDLDQEYRCLHYHTPLDIVGLKCASCQTYYACYHCHDQLTDHaFVPTGHQETSP 60 

Query: 62 VLCGSCRQMLTLKEYKTGFCPYCRMLFNPNCQR 94 
V+CG CR++L+ EY G CPYC+ FNP C R 
25 Sbjct: 61 VI CGHCRKLLSRAEYGCGCCPYCQSPFNPACHR 93 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1143 

30 A DNA sequence (GBSxl219) was identified in S.agalactiae <SEQ ID 3549> which encodes the amino 
acid sequence <SEQ ID 3550>. This protein is predicted to be transport protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N-terminal signal sequence 
35 INTEGRAL Likelihood = -9.45 Transmembrane 300 - 316 ( 292 - 321) 

INTEGRAL Likelihood = -1.17 Transmembrane 255 - 281 ( 265 - 281) 

Final Results 

bacterial membrane Certainty=0. 4779 (Affirmative) < suco 

40 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 13> which encodes amino acid sequence <SEQ ID 
101 14> was also identified. 
45 The protein has homology with the following sequences in the GENPEPT database. 

>GP-.AAF12002 GB:AE002075 transport protein, putative [Deinococcus radiodurans] 
Identities = 108/295 (36%) , Positives = 174/295 (58%) , Gaps = 4/295 (1%) 

Query: 31 GAWINLVNPSQEESEQVADQFGIDIDDLRAPLDVEETSRISVEDDYTLVIVDVPTYEERN 90 
50 G WI+ P+ EE +V+ + G+++D L+ PLD +E SR ED L+I+ + 

Sbjct: 21 GCWIDAAaPTTEELARVSRETGLELDYLKYPLDPDERSRFEREDGQLLIIMQTSYRLAED 80 

Query: 91 NKSYYMTIPMGIIVTDNAVITTC-LEHLTL^FDHETRRRVKNFYTFMKTRWFQLLYRNAE 149 
+ Y T+P+GI+ TD4- ++T C LE + V+ T K R QL RNA+ 

55 Sbjct: 81 SDIPYDTVPLGILHTDHCLVTVCSLEENPVViaOWSGLVRRVSTvKKNRLTLQLFLRNAQ 140 
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Query: 150 LYLQALRTIDRQSDKIEAQDESATRMEQLIDMMELEKSIVYLKASLKFNERIVKKLTSST 209 

+L +R I+++ D IE ++E+ATRN +L+D+++LEKS+VY DEC NE +++++ 
Sbjct: 141 RFLIDWQINKRVDAIEDKMENATP^IRELLDLLKLEKSLVyFITGLKANEAMMERVKRDR 200 

5 Query: 210 SSLKKYIEDEDLLEDTLIETQQMEMANIYEJNVIiNAMTETTASIIG^QNTIMKTIALVT 269 

+ Y ED +LL+D LIE QAIEMA+I N+L +M AS+I NN N ++K L + T 
Sbjct: 201 I-FE^EDSELLDDVLIENLQAIE^3A^ 259 

Query: 270 MTLDIPWIFSAYGMNFQNNWMPLNC3LAHGFIYVVLLAFLMSSFVVFYFIRKKWF 324 
10 + + IPT++ +GMN 4- +P + +GF V+ +A ++S + F F R K F 

Sbjct: 260 ILVAIPTLVSGFFGMNVEG--LPFSDSPYGFWLVMTVAMGIASLLAFLFYRWKVF 312 

A related DNA sequence was identified in S. pyogenes <SEQ ID 715> which encodes the amino acid 
sequence <SEQ ID 716>. Analysis of this protein sequence reveals the following: 

15 Possible site: 61 

>>> Seems to have no N- terminal signal sequence 
INTEGRAL Likelihood = -8.81 
INTEGRAL Likelihood = -1.28 

20 Final Results 

bacterial membrane --- Certainty=0 .4524 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 272/314 (86%), Positives = 296/314 (93%) 

Query: 11 MKQMFLSTAIEFKEIETFEPGAWINLVNPSQEESEQVADQFGIDIDDLRAPLDVEETSRI 70 
MKQMFLS+AIEFKEIETFEPGAWI LVNPSQEES ++ADQF IDI DLRAPLDVEETSRI 
30 Sbjct: 1 MKQMFLSSAIEFKEIETFEPGAWIKLVNPSQEESMKIADQFNIDISDLRAPLDVEETSRI 60 

Query: 71 SVEDDYTLVIVDVPTYEERNNKSYYMTIPMGIIVTDNAVITTCLEHLTLFDHFYRRRVKN 130 

+ VEDDYTL+ IVDVP YEERNNKSYY+T+P+GIIVT+NAVITTCL +TLFDHF+ RRVKN 
Sbjct: 61 AVEDDYTLIIVDVPIYEERNNKSYYITMPLGIIVTEKAYITTCLHDMTLFDHFHNRRVKN 120 

35 

Query: 131 FYTFMKTRFVFQLLYRNAELYLQALRTIDRQSDKIEAQLESATRNEQLIDMMELEKSIVY 190 

FYTFMKTRFVFQ+LYRNAEL+L ALRTIDRQS+++EAQLE+ATRNE+LIDMMELEKSIVY 
Sbjct: 121 FYTFMKTRFVFQILYRNAELFLTALRTIDRQSERLEAQLEAATRNEELIDMMELEKSIVY 180 

40 Query: 191 LKASLKFNERIVKKLTSSTSSLKKYIEDEDLLEDTLIETQQAIEMANIYENVLNAMTETT 250 

LKASLKFNERIVKKL+SSTSSLKKYIEDEDLLEDTLIETQQAIEMA IYENVLNAMTETT 
Sbjct: 181 LKASLKFNERIVKKLSSSTSSLKKYIEDEDLLEDTLIETQQAIEMAGIYENVLNAMTETT 240 

Query: 251 ASIIGNNQNTIMKTLALVTMTLDIPTVI FSAYGMNFQNNWMPLNGLAHGFIYWLLAFLM 310 
45 ASH NNQNTIMKTLAL+TM LDIPTVIFSAYGMNFQNNW+PLNGL H F Y+ L+A L+ 

Sbjct: 241 ASI INNNQNTIMKTLALMTMALDI PTVI FSAYGMNFQHTJWLPLNGLEHAFWYITLIAMLL 300 

Query: 311 SSFWFYFIRKKWF 324 

50 Sbjct: : 

SEQ ID 3550 (GBS257) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 3; MW 35kDa), in Figure 169 (lane 9 & 10; MW 50kDa) and in Figure 
239 (lane 2; MW 50kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of 
55 total cell extract is shown in Figure 48 (lane 6; MW 60kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-1282- 

Example 1144 

A DNA sequence (GBSxl220) was identified in S.agalactiae <SEQ ID 3551> which encodes the amino 
acid sequence <SEQ ID 3552>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

:>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.26 Transmembrane 15B - 174 ( 151 - 182) 

Likelihood = -6.37 Transmembrane 93 - 109 ( 91 - 111) 

Likelihood = -5.68 Transmembrane 188 - 204 ( 184 - 205) 

Likelihood = -0.85 Transmembrane 118 - 134 ( 118 - 134) 



Final Results 

bacterial membrane Certainty=0 . 5904 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3553> which encodes the amino acid 
sequence <SEQ ID 3554>. Analysis of this protein sequence reveals the following: 

Possible site: 52 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.95 Transmembrane 92 - 108 ( 88 - 110) 
INTEGRAL Likelihood = -6.69 Transmembrane 153 - 169 ( 151 - 177) 
INTEGRAL Likelihood = -2.34 Transmembrane 183 - 199 { 183 - 200) 



25 Final Results 

bacterial membrane Certainty=0. 3781 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 135/217 (62%) , Positives = 167/217 (76%) , Gaps = 1/217 (0%) 

Query: 1 MTLQDLTKKNQEFVHIATNQLLADGK3DAEIKAILEEHLPEIIDNQKKGITARSLLGAPT 60 
35 M LQ+LTKKNQEF+H ATN+L+ DGKSD +IK I LEE +P I++NQKKG+TAR+LLG PT 

Sbjct: 1 MELQELTKKNQEFIHTATNKLIQDGKSDEDIKLILEEAIPAILENQKKGVTARNLLGTPT 60 

Query: 61 TWAASFTERPEDKARVSVQKNTNPWLMWLDTSLLFLGLVTALNGLMLLFGQSNVNTGLIS 120 
WARSF+4 P KA KNTNPWLMWLDTSLLF+G+V LNG+M F + TGLIS 

40 Sbjct: 61 AWAASFSQDPSQKA-AETDKNTNPWLNMI£)TSLLFIGIVALLNGIMTFFNTNATVTGLIS 119 

Query: 121 ILTLGFGGGAAMYVTYYYIYRHMGKPKSERPGWLKSFAVLALVMLVWFALFAWPLLPAT 180 

4-L LGFGGGA+MY TYY+ 1 YRH+GK KS RP W K A L+L ML+W AL++ LP + 
Sbjct: 120 LLALGFGGGASMYATYYFIYRHLGIOKSLRPSWFKIIAaLSLAMLIWIALYSATAFLPTS 179 

45 

Query: 181 INPKLPEWLFIIALASFGLRFYLQRKYNIQSSMAPV 217 

+NP+LP + L II S LR+YLQRKYNIQ++M+PV 
Sbjct: 180 LNPQLPPLALLI IGGVSLALRYYLQRKYNIQNTMSPV 215 

50 A related GBS gene <SEQ ID 10787> and protein <SEQ ID 10788> were also identified. Analysis of this 
protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -9.94 
GvH: Signal Score (-7.5): -3.66 

Possible site: 29 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 4 value: -12.26 threshold: 0.0 

INTEGRAL Likelihood =-12.26 Transmembrane 158 - 174 ( 151 - 



WO 02/34771 



-1283- 



PCT/GB01/04789 



INTEGRAL Likelihood = -6.37 Transmembrane 93 - 109 ( 91 - 111) 

Likelihood = -5.68 Transmembrane 188 - 204 ( 184 - 205) 

Likelihood = -0.85 Transmembrane 118 - 134 ( 118 - 134) 
PERIPHERAL Likelihood =8.43 50 
modified ALOM score: 2.95 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5904 (Affirmative) < succ: 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1145 

A DNA sequence (GBSxl221) was identified in S.agalactiae <SEQ ID 3555> which encodes the amino 
acid sequence <SEQ ID 3556>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1348 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or d 



Example 1146 

A DNA sequence (GBSxl222) was identified in S.agalactiae <SEQ ID 3557> which encodes the amino 
acid sequence <SEQ ID 3558>. This protein is predicted to be excinuclease ABC (uvrA). Analysis of this 
protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1738 (Affirmative) < suco 

40 bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 11> which encodes amino acid sequence <SEQ ID 
10112> was also identified. 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC67271 GB:AF017113 excinuclease ABC subunit A [Bacillus subtilis] 
Identities = S42/940 (68%) , Positives = 785/940 (83%) , Gaps = 3/940 (0%) 

Query: 9 DKLMIRGiARAHI&KNISVDIPRDKLVvVTGLSGSGKSSIAFDTIYAEGQRRYVESLSAVA 68 
50 D++ ++GARAHNLKNI V IPRD+LVVVTGLSGSGKSSLAFDTIYAEGQRRYVESLSAYA 

Sbjct: 4 DRIEVKGARAHNLKNIDOTIPRDQLVVVTGLSGSGKSSLAFDTIYAEGQRRYVESLSAYA 63 
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Query: 69 RQFLGNMEKPDVDSIDGLSPAISIDQKTTSKN?RSTVGT\TE;iOyLRIiI.YARVGTPYCI 128 

RQFLG M+KPDVD+I+GLSPAISIDQKTTS+NPRSTVGTVTEI DYLRLLYARVG P+C 
Sbjct: 64 RQFLGQMDKPDVDAIEGLSPAISIDQ^TSRWPRSTVGTVTEIYDYLRLLYARVGKPHCP 123 

Query: 129 NGHGAITASSVEQIVDKVLALPERTKMQILAPIIRRKKGQHKSTFEKIQKDGYVRVRIDG 188 

IT+ ++EQ+VD++L PERTK+Q+LAPI+ +KG H E+I+K GYVRVRIDG 
Sbjct: 124 EHGIEITSQT1EQMVDRILEYPERTKLQVIAPIVSGRKGAHVKVLEQIRKQGYVRVRIDG 183 

Query: 189 DIHDVTEVPELSKSKMHNIDIWDRLINKEGIRSRLFDSVEAALRLSDGYWIDTMDGNE 248 

++ ++++ EL K+K H+r++V4DR++ KEG+ +RL DS+E ALRL +G V+ID + E 
Sbjct: 184 EMAELSDDIELEKNKKHSIEWIDRIWKEGVAARLSDSLETALRLGEGRVMIDVIGEEE 243 

Query: 249 LLFSEHYSCPECGFTVPELEPRLFSFNAPFGSCPTCDGLGIKLEVDIDLVIPDRSKTLRE 308 

L+FSEH++CP CGF++ ELEPRLFSFN-PFG+CPTCDGLG+KLEVD DLVIP++ +L+E 
Sbjct: 244 LMFSEHHACPHCGFSIGELEPRLFSFNSPFGACPTCDGLGMKLEVDADLVIPNQDLSLKE 3 03 

Query: 309 GALVPWNPISSNYYPTMLEQAMTQFGVDMDTPFEKLSKAEQDLALYGSGEREFHFHYIND 368 

A+ PW PISS YYP +LE T +G+DMD P + h K + D LYGSG+ +F Y ND 
Sbjct: 304 NAVAPWTPISSQYYPQLLEAVCTHYGIDMDVPVKDLPKHQLDKVLYGSGDDLIYFRYEND 363 

Query: 369 FGGERMIDLPFEGVVtMINRRYHETWSDYTRNVMRETYMNELKCOTCHGYRLNDQALCVRV 428 

FG R ++ FEGV+ KI RRY ET SD+ R M +YM++ C TC GYRL +AL V + 
Sbjct: 364 FGQIREGEIQFEGVLRNIERRYKETGSDFIREQMEQYMSQKSCPTCKGYRLKKEALAVLI 423 

Query: 429 GGEEGLNIGQVSDLSIADHLELLETLRLSSNEQLIARPIIKEIHDRLSFLHHVGIiNYLNL 488 

+G 4-IG++++LS4-AD L + L LS 4- IA I++EI +RLSFL+ VGL+YL L 
Sbjct: 424 DGRHIGKITELSVADALAFFKDLTLSEKDMQIANLILREIVERLSFLDKVGLDYLTL 480 



Sbjct: 481 SRAAGTLSGGEAQR1R1ATQIGSRLSGVLYILDEPSIGLHQRDNDRLISALICNI1RDLGNT 540 

Query: 549 LIVVEHDEDTMMAADWLIDVGPGAGAFGGEIVASGTPKQVAKOTKSITGQYLSGKKVIPV 608 

LIWEHDEDTMMAAD+LID+GPGAG GG+++++GTP++V ++ S+TG YLSGKK IP+ 
Sbjct: 541 LIWEHDEDTMMAADYLIDIGPGAGIHGGQVISAGTPEEVMEDPNSLTGSYLSGKKFIPL 600 

Query: 609 PSERRVGNGRFLEIKGAAENNLQNLDVKFPLGKFIAVTGVSGSGKSTI.TNSILKKAVAQK 668 

P ERR +GR++EIKGA+ENNL+ ++ KFPLG F AVTGVSGSGKSTL+N IL KA+AQK 
Sbjct: 601 PPERRKPDGRYIEIKGASENNLKKVNAKFPLGTFTAVTGVSGSGKSTLWEILHICALAQK 660 

Query: 663 UffiWSDKPGKYVSLEGIEYVDRLIDIDQSPIGRTPRSNPATYTGVFDDIRDLFAQTNEAK 728 

L++ KPG + ++G++++D++IDIDQ+PIGRTPRSNPATYTGVFDDIRD+FAQTNEAK 
Sbjct: 661 LHKAKAKPGSHKEIKGLDHLDKVIDIDQAPIGRTPRSNPATYTGVFDDIRDVFAQTNEAK 720 

Query: 729 IRGYKKGRFSFNVKGGRCESCSGDGIIKIEMHFLPDVYVPCEVCHGTRYNSETLEVHYKE 788 

+RGYKKGRFS FNVKGGRCE+C GDGIIKIEMHFLPEVYVPCEVCHG RYN ETLEV YK 
Sbjct: 721 WGYKKGRFSFHVKGGRCEACRGDG1IKIEI4HFLPDVYVPCEVCHGKRYWRETLEVTYKG 780 



Query: 849 ELHKRSTGKSLYILDEPTTGLHADDIARLLIWLDRFVDDGNTVLVIEHNLDVIKTADHII 908 

ELHKRSTG++LYILDEPTTGLH DDIARLL VL R VD+G+TVLVIEHNLD+HCTAD+I+ 
Sbjct: 841 ELHKRSTGRTLYILDEPTTGLHVDDIARLLVVLQRLVDNGDTVLVIEHNLDI I KTADYIV 900 

Query: 909 DLGPEGGIGGGQIVAIGTPEEVAENPKSYTGYYLKEKLAR 948 

DLGPEGG GGG IV& GTPEE+ E +SYTG YLK + R 
Sbjct: 901 DLGPEGGAGGGTIVASGTPEEITEVEESYTGRYLKPVIER 940 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3559> which encodes the amino s 
sequence <SEQ ID 3560>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
■ Final Results 
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bacterial cytoplasm --- Certainty=0 . 1138 (Affirmative) < sue 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ= 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ; 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 835/940 (38%) , Positives = 896/940 (94%) 
Query: 7 MQDKLMIRGARAHNLKNISVDIPRDKLVim'GLSGSGKSSLAFDTIYAEGQRRYVESLSA £ 



Query: 67 YARQFLGNMEKPDvDSIDGLSPAISIDQKTTSKNPRSTVGTVTSINDYLRIjLYARVGTPY 126 

YARQFLGNMEKPDVDSIDGLSPAISIDQKTTSKNPRSTVG7VTEINDYLRLLYARVGTPY 
Sbjct: 71 YARQFLGNMEKPDVDSIDGLSPAISIDQKTTSKNPRSTVGTVTEINDYLRLLYARVGTPY 13 0 

Query: 127 CINGHGAITASSVEQIVDKvTjALPERTKMQILAPIIRRKKGQHKSTFEKIQKDGYVRVRI 186 

CINGHGAITASS EQIV++VLALPERT+MQILAP++RRKKGQHK+ FEKIQKDGYVRVR+ 
Sbjct: 131 CINGHGAITASSAEQIVEQVLALPERTRMQIIAPVVRRKKGQHKTVFEKIQKDGYVRVRV 190 

Query: 187 DGDIHDVTEVPELSICSICMHNIDIWDRLINICEGIRSRLFDSVEAALRLSDGYWIDTMDG 246 

DGDI DVTEVPELSKSKMHNI++V+DRL+NK+GIRSRLFDSVEAALRL DGY++IDTMDG 
Sbjct: 191 DGDIFDVTEVPELSKSKMHNIEWIDRLVNKDGIRSRLFDSVEAALRLGDGYLMIDTMDG 250 

Query: 247 NELLFSEHYSCPECGFTVPELEPRLFSFNAPFGSCPTCDGLGIKLEVDIDLVIPDRSKTL 306 

NELLFSEHYSCP CGFTVPELEPRLFSFNAPFGSCPTCDGLGIKLEVD+DLV+PD SK+L 
Sbjct: 251 NELLFSEHYSCPVCGFTVPELEPRLFSFNAPFGSCPTCDGLGIKLEVDLDLWPDPSKSL 310 

Query: 307 REGAIJVPWNPISSNYYPTMLEQA^lTQFGvDMDTPFEKLSKAEQDLALYGSGEREFHFHYI 366 

REGAL PWNP1SSNYYPTMLEQAM FGVDMDTPFB L++ E+DL LYGSG+REFHFHY+ 
Sbjct: 311 RECmiAPWNPISSNYYPTMLEQAMASFGVDMDTPFEALTEEERDLVLYGSGDREFHFHYV 370 

Query: 367 NDFGGERNIDLPFEGWNNINRRYHETNSDYTRNVMREYMNELKCNTCHGYRLNDQALCV 426 

NDFGGERNID+PFEGW N+NRRYHETNSDYTRNVMR YMNEL C TCHGYRLNDQALCV 
Sbjct: 371 NDFGGERNIDIPFEGVvTNvNRRYHETNSDYTR^T^RGYT^NELTCATCKGYRIiNDQALOT 430 

Query: 427 RVGGEEGLNIGQVSDLSIADHLELLETLRLSSNEQLIARPIIKEIHDRLSFLNNVGLNYL 486 

VGGEEG +IGQ+S+LSIADHL+LLE L L+ NE IA+PI+KEIHDRL+FLNNVGLNYL 
Sbjct: 431 HVGGEEGTHIGQI SELS IADHLQLLEELELTENESTIAKPIVKEIHDRLTFLNNVGLNYL 490 

Query: 487 NLSRSAGTLSGGESQRIRLATQIGSNLSGVLYVLDEPSIGLHQRDNDRLIDSLKKMRDLG 546 

LSR+AGTLSGGESQRIRLATQIGSNLSGVLY+LDEPSIGLHQRDNDRLI+SLKKMRDLG 
Sbjct: 491 TLSRAAGTLSGGESQRIRLATQIGSNLSGVLYILDEPSIGLHQRDNDRLIESLKKMRDLG 550 

Query: 547 NTLIVVEHDEDTMMAADWLIDVGPGAGAFGGEIVASGTPKQVAKNTKSITGQYLSGKKVI 606 

NTLIWEHDEDTMM ADWLIDVGPGAG FGGEI ASGTPKQVAKN KSITGQYLSGKK I 
Sbjct: 551 NTLIWEHDEDTMMQADWLIDVGPGAGEFGGEITASGTPKQVAKNKKSITGQYLSGKKFI 610 

Query: 607 PVPSERRVGNGRFLEIKGAAEMOT.QNLDVKFPLGKFIAVTGVSGSGKSTLINSILKKAVA 666 
PVP ERR GNGRF+EIKGAA+NNLQ+LDV+FPLGKFIAVTGVSGSGKSTL+NSILKKAVA 

Query: 667 QKMIRNSDKPGKYVSLEGIEYVDRLIDIDQSPIGRTPRSNPATYTGVFDDIRDLFAQTNE 726 

QKLNRN+DKPGKY S+ GIE+++RLJDIDQSPIGRTPRSNPATYTGVFDDIRDLFAQTNE 
Sbjct: 671 QKLNRNADKPGKYHSISGIEHIERLIDIDQSPIGRTPRSNPATYTGVFDDIRDLFAQTNE 730 

Query: 727 AKIRGYKKGRFSFK 
AKIRGYKKGRFSFE 

Sbjct: 731 AKIRGYECKGRFSFNVKGGRCEACSGDGIIKIEMHFLPDVYVPCEVCHGRRYNSETLEVHY 790 

Query: 787 KEKNIAQILDMTVNDAVTFFAAIPKIARKLQTIKDVGLGYVTLGQPATTLSGGEAQRMKL 846 

K KNIA++LDMTV+DA+ FF+AIPKIARK+QTIKDVGLGYVTLGQPATTLSGGEAQRMKL 
Sbjct: 791 KGKNIAEVLDMTVDDALVFFSAIPKIARKICjTIKDVGLGYVTLGQPATTLSGGEAQRMKL 850 

Query: 847 ASELHKRSTGKSLYILDEPTTGLHADDIARLLKvLDRFVDDGNTVLVIEHNLDVIKTADH 906 

ASELHKRSTGKSLYILDEPTTGLH DDIARLLKVL+RFVDDGNTVLVIEHNLDVIK+ADH 
Sbjct: 851 ASELHICRSTGKSLYILDEPTTGLHTDDIARLLKVXERFVDDGNTVLVIEHNLDVIKSADH 910 
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Query: 907 IIDLGPEGGIGGGQIVAIGTPEEVAENPKSYTGYYLKEKL 946 

IIDLGPEGG GGGQIVA GTPEEVA+ +SYTG+YLK KL 
Sbjct: 911 IIDLGPEGGDGGGQIVATGTPEEVAQVKESYTGHYLKVKL 950 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1147 

A DNA sequence (GBSxl223) was identified in S.agalactiae <SEQ ID 3561> which encodes the amino 
acid sequence <SEQ ID 3562>. Analysis of this protein sequence reveals the following: 

Possible site: 60 



Seems to 


have an uncleavable N 


-term signal seq 
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bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 5161 (Affirmative; 

- Certainty=0. 0000 (Not Clear) . 

- Certainty=0. 0000 (Not Clear) ■ 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12192 GB:Z99106 similar to multidrug resistance protein [Bacillus subtilis] 
Identities = 198/481 (41%) , Positives = 300/481 (62%) , Gaps = 24/481 (4%) 



Query: 9 
Sbjct: 5 



IHGKPYNRTAMITLLLIATFAGVLNQTSLGTAIPTLMNSFNISLSTAQQATTWFLLANGI 68 
I KP+NR+ ++ +LL F +LNQT L TA+P +M FN+ ' + AQ TT F+L NGI 
IEQKPFNRSVIVGILLAGAFVAILNQTLLITALPHIMRDFNVDANQAQWLTTSFMLTNGI 64 



Query: 69 MIPVSAYIATRFSTKWLYVTSYVVLLIGLLMTTIAPTSNWNLFLVGRIIQAISVGISMPL 128 

+IP++A+L +F+++ L +T+ + G ++ AP N+ + L RIIQA GI MPL 
Sbjct: 65 LI PITAFLIEKFTSRALLITAMS I FTAGTWGAFAP - -NFPVLLTARI ICAAGAGIMMPL 122 

Query: 129 NQVVMVWFPPEQRGAAMGLNGLWGLAPAIGPTLAGWILKQEFHFAGHDLTWRAIFLLP 188 

MQ V + +FP E+RG AMG+ GLV+ APAIGPTL+GW ++ +WR++F + 

Sbjct: 123 MQTVFLTI FP IEKRGQAMGMVGLVI S FAPAI GPTL SGWAVEA FSWRSLFYII 174 

Query: 189 LLILTVTTILSPFVLKDWDNKSVKLEVPSLILSIIGFGSFLWGFTJWATYGWGDIGYVI 248 
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Query: 429 LSSVAQNIITNWKPSKDLLTMNPLKYANQMLNASLDGFHVSFAIGFVFAVLGLLVSLFLRK 489 

L SV N + + +A+L G + +F + V A++G L+S L+K 

Sbjct: 414 LVSVMSNQAAH AGTTNVKHAALHGIWAAFIVAAVIALVGFLLSFTLKK 461 

5 There is also homology to SEQ ID 46. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1148 

A DNA sequence (GBSxl224) was identified in S.agalactiae <SEQ ID 3563> which encodes the amino 
10 acid sequence <SEQ ID 3564>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.81 Transmembrane S - 24 ( 5-30) 
INTEGRAL Likelihood = -7.32 Transmembrane 36 - 52 ( 31 - 54) 

15 

Final Results 

bacterial membrane --- Certainty=0 .4524 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 

A related GBS nucleic acid sequence <SEQ ID 101 09> which encodes amino acid sequence <SEQ ID 
10110> was also identified. 

A related GBS gene <SEQ ID 8743> and protein <SEQ ID 8744> were also identified. Analysis of this 

protein sequence reveals the following: 

25 Lipop: Possible site: -1 Crend: 8 

McG: Discrim Score: 9.52 
GvH: Signal Score (-7.5): -3.4 

Possible site: 22 
>» Seems to have an uncleavable N-term signal seq 
30 ALOM program count: 1 value: -7.32 threshold: 0.D 

INTEGRAL Likelihood = -7.32 Transmembrane 11 - 27 ( 6 - 29) 
PERIPHERAL Likelihood = 11.19 130 
modified ALOM score: 1.96 

35 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 3930 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8744 (GBS29) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
45 extract is shown in Figure 7 (lane 2; MW 25.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 15 (lane 6; MW 51kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1149 

A DNA sequence (GBSxl225) was identified in S.agalactiae <SEQ ID 3565> which encodes the amino 
acid sequence <SEQ ID 3566>. This protein is predicted to be aminopeptidase P (pepQ). Analysis of this 
protein sequence reveals the following: 

Possible site: 41 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0724 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 6 RLTRCQTAISQLSCDALLITNLTNIFYLTGFSGTKATVLISPKHRIFVTDSRYALIAKNT 65 

R+ + + + + D+LLIT++ NIFYLTGFSGT TV ++ K IF+TDSRY+ +A+ 
Sbjct: 2 RIEKLKVKMLTENIDSLLITDMKN1FYLTGFSGTAGTVFLTQKRNIFMTDSRYSEMARGL 61 

Query: 66 VREFDI I ISREPLAAILKI IRDDALIAIGFETDISYHMYKHMVEVFEDYRLIEAPSVVEK 125 

++ F+II +R+P++ + ++ +++ + FE + Y -t-K + + L + V + 

Sbjct: 62 IKNFEIIETRDPISLLTELSASESVKNMAFEETVDYAFFKRLSKAATKLDLFSTSNFVLE 121 

Query: 126 LRMIKD 131 

LR IKD 
Sbjct: 122 LRQIKD 127 

There is also homology to SEQ ID 3568. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1150 

A DNA sequence (GBSxl226) was identified in S.agalactiae <SEQ ID 3569> which encodes the amino 
acid sequence <SEQ ID 3570>. This protein is predicted to be aminopeptidase P (pepQ-2). Analysis of this 
protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2508 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA70068 GB:Y08842 aminopeptidase P [Lactococcus lactis] 
Identities = 131/205 (63%), Positives = 163/205 (78%), Gaps = 3/205 (1%) 

Query: 2 LDFIKPDRTTELQVANFLDFRMRELGATGPSFDFIVASGYRSAMPHGVASQKTIQSGETL 61 

L FI+P RT E4+VANFIjDF+MR+L A+G SF+ IVASG RS++PHGVA+ K IQ G+ + 
Sbjct: 149 LRFIEPGRT-EIEVANFLDFKMRDLEASGISFETIVASGKRSSLPHGVATSKMIQFGDPV 207 

Query: 62 TLDFGCYYQHYVSDMTRTIHIGHVTDQEREIYDIVLKSNQAIIGNVKSGMKRCDYDYIiAR 121 

T+DFGCYY+HY SDMTRTI +G V D+ R IY+ V K+N+A+I VK+GM YD + R 
Sbjct: 208 TIDFGCraHYASDMTRTIWGSVDDKmTIYETTOKANEALIKQVKAGMTYAQYDNIPR 267 

Query: 122 QVIENSGYGNHFTHGIGHGMGLDVHEIPYFGKS--EGVIASGMVVTDEPGIYLDMKYGVR 179 
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+VIE + +G + FTHGIGHG+GLDVHE I PYF +S E + SGMV+TDEPGIYL GVR 
Sbjct: 268 EVIEKADFGQYFTHGIGHGLGLDVHEIPYFNQSMTENQLRSGMVITDEEGIYLPEFGGVR 327 

Query: 180 IEDDLLITETGCEVLTSAPKELIVL 204 
5 IEDDLL+TE GCEVLT APKELIV+ 

Sbjct: 328 IEDDLLVTENGCEVLTKAPKELIVI 352 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3567> which encodes the amino acid 
sequence <SEQ ID 3568>. Analysis of this protein sequence reveals the following: 

10 Possible site: 45 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1450 (Affirmative) < suco 

15 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 145/203 (71%) , Positives = 171/203 (83%) 

Query: 2 LDFIKPDRTTELQVANFLDFRMRELGATGPSFDFIVASGYRSAMPHGVASQKTIQSGETL 61 

LDFIKP TTE +ANFLDFRMR+ GA+G SFD IVASGY SAMPHG AS K IQ+ E+L 
Sbjct: 168 LDFIKPGTTTERDLANFLDFRMRQYGASGTSFDIIVASGYLSAMPHGRASDKVIQNKESL 227 

Query: 62 TLDFGCYYQHYVSDMTRTIHIGHVTDQEREIYDIVLKSNQAIIGNVKSGMKRCDYDYLAR 121 

T+DFGCYY HYVSDMTRTIHIG VTD+EREIY +VL +N+A+I +GM D+D + R 
Sbjct: 228 TMDFGCYYNHWSDMTRTIHIGQVTDEEREIYALVLAANKALIAKASAGMTYSDFDGIPR 287 

Query: 122 QVIENSGYGIfflFTHGIGHG^IIJVHEIPYFGKSEGVIASGMVVTDEPGIYLDNKYGVRIE 181 

Q+I +GYG+ FTHGIGHG+GLD+HE P+FGKSE ++ +GMWTDEPGIYLDNKYGVRIE 
Sbjct: 288 QLITEAGYGSRFTHGIGHGIGLD1HENPFFGKSEQLLQAGMWTDEPGIYLDNKYGVR1E 347 

Query: 182 DDLLITETGCEVLTSAPKELIVL 204 

DDL+IT+TGC+VLT APKELIVL 
Sbjct: 348 DDLVITKTGCQVLTLAPKELIVL 370 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1151 

A DNA sequence (GBSxl227) was identified in S.agalactiae <SEQ ID 3571> which encodes the amino 
acid sequence <SEQ ID 3572>. This protein is predicted to be yfhC protein (comEB). Analysis of this 
protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0, 1401 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05053 GB:AP001511 late competence operon required for DNA 
binding and uptake [Bacillus halodurans] 
Identities = 78/146 (53%) , Positives = 107/146 (72%) 

Query: 1 MNRLSVffiDYFMANAELISKRSTCDRAFVGAVLVKNNRIIATGYNGGVSETDNClffiVGHYM 60 

MNR+SW+ YFMA + L++ RSTC R VGA +V++ RIIA GYNG +S +C + G Y+ 
Sbjct: 1 MNRISWDQYFMAQSHLIMjRSTCTRIjMvGft.TIVRDKRIIAGGYNGSISGGPHCIDEGCYV 60 
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Query: 61 EDGHCIRTVHaE^ALIQC^KEGISTinOTEIl^HFPCINCrKALLQAGVKKITYKMITO 120 

+GHCIRT+HAE+NAL+QCAK G+ T EIYVTHFPC+NCTKA++Q+G+KK+ Y +Y+ 
Sbjct: 61 VEGHCIRTIHAEVNALLQCAKFGVPTEGAEIYVTHFPCVNCTK&IIQSGIKKVYYATDYK 120 

Query: 121 PHPFAIELMEAKGVAYVQHDVPEVTL 146 

P+A Eli GV Q ++ E+ L 
Sbjct: 121 NSPYAEELFRDAGVDVEQVELEEMIIi 146 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3573> which encodes the amino acid 
sequence <SEQ ID 3574>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 3155 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 133/146 (91%), Positives = 140/146 (95%) 

Query: 2 mLSWEDYFMANAELISKRSTCDRA?VGAVLVKNNRIIATGYNGGVSETDNCimVGHYME 61 

I^LSW+DYFMANAELISKRSTCDRAFVGAVLVK+NRIIATGYNGGVS TDNCNE GHYME 
Sbjct: 18 NRLSWQDYFMANAELI SKRSTCDRAFVGA VLVKDNRI IATGYNGGVSATDNCNEAGHYME 77 

Query: 62 DGHCIRTVHAEMNALIQCAKEGISTI^EIYVTHFPCINCTKALLO^GVKKITYKANYRP 121 

DGHCIRTVHAEMNALIQCAKEGIST+ TEIYVTHFPCINCTKALLQAG+ KITYKA+YRP 
Sbjct: 78 DGHCIRTVHAE^ALIQ<2AKEGISTDGTEIYVTHFPCINCTKALLQAGITKITYKAHYRP 137 

Query: 122 HPFAIELMEAKGVAYVQHDVPEVTLG 147 

HPFAIELME KGVAYVQHDVP++ LG 
Sbjct: 138 HPFAIELMEKKGVAYVQHDVPQIVLG 163 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1152 

A DNA sequence (GBSxl228) was identified in S.agalactiae <SEQ ID 3575> which encodes the amino 
acid sequence <SEQ ID 3576>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2454 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1153 

A DNA sequence (GBSxl229) was identified in S.agalactiae <SEQ ID 3577> which encodes the amino 
acid sequence <SEQ ID 3578>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
5 >» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.65 Transmembrane 4 - 20 ( 3-21) 

Final Results 

bacterial membrane Certainty=0 .1659 (Affirmative) < suco 

10 bacterial outside — Certainty=0 .0000 (Hot Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1154 

A DNA sequence (GBSxl230) was identified in S.agalactiae <SEQ ID 3579> which encodes the amino 

acid sequence <SEQ ID 3580>. Analysis of this protein sequence reveals the following: 

20 Possible site: 54 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

25 bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04699 GB:AP001510 unknown conserved protein [Bacillus halodurans] 
30 Identities = 47/94 (50%) , Positives = 65/94 (69%) 

Query: 2 LLPVGSWYLIDGNQKLVIVNRGAIVEQEGQEVYFDYLGGIFPEGLNLEQVYYFNQEDID 61 

+LP+GS+VYL +G KL+I+NRG I+E G+ FDY G +P+GL ++V+YFN E+ID 
Sbjct: 1 MLPIGSIVYLKEGTSKLMILNRGPILEANGSNtOTFDYSGCFYPQGLVPDKVFYFNHENID 60 

35 

Query: 62 EWFEGYHDEEEERVSRLIEKHKNTEGKNLPKGK 95 

EWFEG+ D+EE+R +L WK KGK 
Sbjct: 61 EWFEGFQDDEEQRFQKLFHDWKKENKDRYVKGK 94 

40 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1155 

A DNA sequence (GBSxl231) was identified in S.agalactiae <SEQ ID 358 1> which encodes the amino 
45 acid sequence <SEQ ID 3582>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0 .3560 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000<Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1156 

A DNA sequence (GBSxl232) was identified in S.agalactiae <SEQ ID 3583> which encodes the amino 
acid sequence <SEQ ID 3584>. This protein is predicted to be elongation factor p (efp). Analysis of this 
protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3067 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14376 GB:Z99116 elongation factor P [Bacillus subtilis] 
Identities = 83/186 (47%) , Positives = 120/186 (63%) , Gaps = 1/186 (0%) 

M1EASKLKAGMTFETADGKLIRVLEASHHKPGKGNTIMRMKLRDVRTGSTFDTSYRPEEK 6 0 
MI + + G+T + DG + RV++ H KPGKG +R KLR++RTG+ + ++R EK 
M I SVNDFRTGLTI DV- DGGIWRVVDFQHWPGKGlAAFVRSKLRNIiRTGAIQEKTFRAGEK 59 

FEQAIIETVPAQYLYKMDDTAYFMNNETYDQYEIPTVNIENELLYILENSEVKIQFYGTE 120 
+A IET QYLY D FM+ +Y+Q E+ IE EL Y+LEN V I Y E 



+G+++P TVEL V ET+P I KG T +G KPA ETGLWNVP F+ G LV+NT++G 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3585> which encodes the amino acid 
sequence <SEQ ID 3586>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 



Query: 


1 


Sbjct: 


1 




61 


Sbjct: 


60 




121 


Sb j ct : 


120 




181 


Sb j ct : 


180 



Final Results 

bacterial cytoplasm Certainty=0 . 1813 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 170/186 (91%) , Positives = 180/186 (96%) , Gaps = 1/18S (0%) 

Query: 1 MIFASKLKAGMTFETADGKLIRVLEASHHKPGKGNTIMRMKLRDVRTGSTFDTSYRPEEK 60 

MIEASKLKAGMTFE A+GKLIRV1EASHHKPGKGNTIMRMKLRDWTGSTFDT+YRP+EK 
Sbjct: 1 MIFASKLKAGMTFE-AEGIOiIRVLEASHHKPGKC-NTIMRMKLRDVRTGSTFDTTYRPDEK 59 

Query: 61 FEQAIIETVPAQYLYKMDDTAYFMNKETYDQYEIPTVNIENELLYILENSEVKIQFYGTE 120 
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Query: 121 VIGVQIPTTVELWAETQPSIKGATVTGSGKPATMETC3LVV1IVPDFIEAGQKLVINTAEG 180 
5 VIGV +PTTVELTVAETQPSIKGA'TVTGSGKPAT+ETGU/VNVPDFIEAGQKIj+INTAEG 

Sbjct: 120 VIGVTVPTTVELWAETQPSIKGATVTGSGKPA^ 179 

Query: 181 TYVSRA 186 
TYVSRA 

10 Sbjct: 180 TYVSRA 185 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1157 

15 A DNA sequence (GBSxl233) was identified in S.agalactiae <SEQ ID 3587> which encodes the amino 
acid sequence <SEQ ID 3588>. Analysis of this protein sequence reveals the following: 
Possible site: 29 

>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 1508 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06505 GB:AP001516 unknown conserved protein [Bacillus halodurans] 
Identities = 42/107 (39%) , Positives = 70/107 (65%) , Gaps = 4/107 (3%) 

Query: 5 NLGEIVISPRVLEVITGIAATKVDGVHSLRNK AVTDSLSKKSLGRGVYIiKNEEDDTV 61 

30 +LG + ISP V+EVI GIAA++V+GV ++R V + L K+ G+GV + + D+ + 

Sbjct: 15 DLGRvEISPEVIEVIAGIAASEVEGVATMRGNFAAGVAEKLGYKNHGKGVTCV-DLNDEGI 73 

Query: 62 AADIYVYLQYGVNVPAVSIAIQQAVKTAVYDMAEVKISSVNIHVEGI 108 
D+ V + YGV+VP V+ IQQ +K A+ M +++ S+N+H+ G+ 
35 Sbjct: 74 IVDVSVIILYGVSVPEVAKKIQQNIKQALQTMTAIELQSINVHIVGV 120 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3589> which encodes the amino acid 
sequence <SEQ ID 3590>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0882 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 101/129 (78%) , Positives = 113/129 (87%) 

Query: 1 MTTENLGEIVISPRVLEVITGIAATKTOGVHSLRNKAVTDSLSKKSLGRGVYLKNEEDDT 60 

MTTE +GEIVISPRVLEVITGIA T+V+GVHSL NK + DS +K SLG+GVYL+ EED + 
Sbjct: 1 MTTEYIGEIVISPRVLEVITGIATTQVEGVHSLHNKKMADSFNKASLGKGVYLQTEEDGS 60 

Query: 61 VAADIYvYLQYGvOTPAVSIAIQQAVKTAVYr^iAEVKISSVNIHVEGIVPEKTPKPDLKS 120 

V ADIYVYLQYGV VP VS+ IQ+ VK+AVYDMAEV IS+VNIHVEGIV EKTPKPDLKS 
Sbjct: 61 VTADIYWLQYGVKVPTVSMNIQKnnCSAVYDKAEVPISAVNIHVEGIVAEKTPKPDLKS 120 



Query: 121 LFDEDFLDD 129 
LFDEDFLDD 
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Sbjct: 121 LFDEDFLDD 129 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1158 

A DNA sequence (GBSxl234) was identified in S.agalactiae <SEQ ID 3591> which encodes the amino 
acid sequence <SEQ ID 3592>. This protein is predicted to be n utilization substance protein b homolog 
(nusB). Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.32 Transmembrane 48 - 64 ( 47 - 64) 

Final Results 

bacterial membrane Certainty=0. 1128 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14363 GB.-Z99116 similar to transcription termination 
[Bacillus subtilis] 
Identities = 51/129 (39%) , Positives = 82/129 (63%) , Gaps = 9/129 (6%) 

Query: 9 RPJJLRERAFQTLFSLETG^EFIDAAHFAYGYDKTVSEDK^EVPIFLnNLvNGVVDHKDE 68 

RR RE+A Q LF ++ ++ A + + E+K F LV+GV+ +H+D+ 

Sbjct: 3 RRTAREKALQALFQIDVSDIAVNEA IEHALDEEKT DPFFEQLVHGVLEHQDQ 54 

LD +IS HL + W L+R+ VD+++LRL YE+ Y ++ P ~ V++NE IE+AK++ D+ + 
Sbjct: 55 LDEMI SKHLVN- WKLDRI ANVDRAI LRLAAYEMAYAED I P VNVSMNEAI ELAKRFGDDKA 113 

Query: 129 AKFVNGLLS 13 7 

KFVNG-t-LS 
Sbjct: 114 TKFVNGVLS 122 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3593> which encodes the amino acid 
sequence <SEQ ID 3594>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.75 Transmembrane 53 - 69 ( 53 - 69) 

Final Results 

bacterial membrane --- Certainty=0 . 1702 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

=>GP:CAB14363 GB:Z99116 similar to transcription termination 
[Bacillus subtilis] 
Identities = 47/134 (35%) , Positives = 76/134 (56%) , Gaps = 10/134 (7%) 

Query: 15 RRDLRERAFQALFNIEMGAELLAASQFAYGYDKVTGEDAQVLELPIFLLSLVTGVNNHKE 74 

RR RE+A QALF I++ +++ + D+ + F LV GV H++ 

Sbjct: 3 RRTAREKALQALFQIDV-SDIAVNEAIEHALDEEKTDP FFEQLVHGVLEHQD 53 

Query: 75 ELDNLISTHLKKGWSLERLTLTDKTmRIflLFEIKYFDOPDRVALNEIIEWKiaSDET 134 

+LD +IS HL W L+R+ D+ +LRL +E+ Y + P V++NE IE+ K++ D+ 
Sbjct: 54 QLDEMISKHLW-WKLDRIANVDRAIIiTU^YEMAYAEDIPvNVSMNEAIELAKRFGDDK 112 
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Query: 135 SAKFINGLLSQYVS 148 

+ KF+NG+LS S 
Sbjct: 113 ATKFVNGVLSNIKS 126 

5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 104/142 (73%), Positives = 125/142 (87%), Gaps = 1/142 (0%) 

Query: 1 MTSVFKDSRRDLRERAFQTLFSLETGGEFIDAAHFAYGYDKT/SED-KVLEVPIFLLNLV 59 
MT+ F++SRRDLRERAFQ LF++E G E + A+ FAYGYDK ED +VLE+PIFLL+LV 
10 Sbjct: 7 MTNSFQNSEWDLRERAFQALFNIEMGAELIAASQFAYGYDKVTGEDAQVLELPIFLLSLV 66 

Query: 60 NGWDHEODELDTLISSHLKSGWSLERLTLVDKSLLRLGLYEIKYFDETPDRVAL1TOIIEI 119 

GV +HK+ELD LIS+HLK GWSLERLTL DK+LLRLGL+EIKYFD+TPDRVALNEIIE+ 
Sbjct: 67 TGVNNHKEELDNLISTHLKKGWSLERLTLTDKTLLRLGLFEIKYFDKTPDRVALNEIIEV 126 

15 

Query: 120 AKKYSDETSAKFVNGLLSQFIT 141 

KKYSDETSAKF+NGLLSQ+++ 
Sbjct: 127 VKKYSDETSAKFINGLLSQYVS 148 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1159 

A DNA sequence (GBSxl235) was identified in S.agalactiae <SEQ ID 3595> which encodes the amino 

acid sequence <SEQ ID 3596>. Analysis of this protein sequence reveals the following: 

25 Possible site: 20 

>>> Seems to have a cleavable N-terra signal seq. 

INTEGRAL Likelihood = -2.81 Transmembrane 239 - 255 ( 239 - 255) 

Final Results 

30 bacterial membrane Certainty=0 .2126 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

35 >GP:AAC31628 GB:U46902 ScrR [Streptococcus mutans] 

Identities = 225/320 (70%) , Positives = 273/320 (85%) 





Query: 


1 


IWAKLTDVAAIAGVSPTTVSRVIKKKGYLSQKTVTKVSIEA^riX3YKPITOIARSLQGKSA 


60 








MVAKLTDVA LAGVSPTTVSRVIN+KGYLS+KT+TKV AM4-TLGYKPNNLARSLQGKSA 




40 


Sbjct: 


1 


WAIO.TDVAKIAGVSPTWSRVINRKGYLSEKTITKVQAMKTI^YKPNNIjARSLQGKSA 


60 






61 


KLIGLIFPNIRWIFYAELIEHLEIELFKHGYKTILCNSEKDPIKEKEYLEMLGANQVDGI 


120 








KLIGLIFPNI +IFY+ELIE+LEIELFKHGYK I+CNS+ +P KE+4YLEML ANQVDGI 




45 


Sbjct: 


61 


KLIGLIFENISHIFYSELIEYLEIELFKHGYKAIICNSQNNPDKERDYLEMLEANQVDGI 


120 




Query: 


121 


ISSSHNLGIDDYEKVEAPIVAFDRNLAPHIPIVSSDNFFGGKMAAQTLKKHGCQKMIMIT 










ISSSHNLGIDDYEKV API+AFDRNLAP+IPIVSSDNF GG+MAA+ LKKHGCQ IMI 






Sbjct: 


121 


ISSSHNLGIDDYEKVSAPIIAFDRNLAPNIPIVSSDNFEGGRMAAKLLKKHGCQHPIMIA 


180 


50 




181 


GNDNSDSPTGLRRLGFSYESKESKVITVTNGLSHMRREMELKSIISTHKPDGIFTSDDLT 


240 








G DNS+SPT LR+LGF ++ + ++ LS +R+EME+K 1+ KPDGIF SDD+T 






Sbjct: 


181 


GKDNSNSPTALRQLGFKSVFAQAPIFHLSGELSIIRKEMEIKVILQNEKPDGIFLSDDMT 


240 








ALLVIKLISQLGLSIPEDIKVIGYDGTSFIQDYVPHLTTIKQPIREIAQLMVEILLAKIE 


300 


55 






A+L +K+ 4QL ++IP ++K+IGYDGT F+++Y P+LTTT+QPI++IA L+V+ILL KI+ 






Sbjct: 




AILTMKIANQLNITIPHELKIIGYDGTHFVENYYPYLTTIRQPIKDIAHLLVDILLKKID 


300 




Query: 


301 


GQKTNKDYILPVSLIPGSSV 320 










Q KDYILPV L+ G SV 




60 


Sbjct: 


301 


HQDIPKDYILPVGLLSGESV 320 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3597> which encodes the amino acid 
sequence <SEQ ED 3598>. Analysis of this protein sequence reveals the following: 

I-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC31628 GB:U46902 ScrR [Streptococcus mutans] 
Identities = 226/321 (70%) , Positives = 269/321 (83%) , Gaps = 1/321 (0%) 



QLIGLIFPNISNIFYAELIEHLEIELFKQGYKTIICNSEHNPVKEREYLEMLAANQVDGI 120 
+LIGLIFPNIS+IFY+ELIE+LEIELFK GYK IICNS++NP KER+YLEML ANQVDGI 
JCLIGLIFPNISHIFYSELIEYLEIELFKHGYKAIICNSQNNPDKERDYLEMLEANQVDGI 120 



ISSSHNLGI+DYE+V API+AFDRNLAPNIP++SSDNFEGG++AA+ L+KHGCQ+ +MI 



6 DNS+SPT LRQLGF + AIL LS +R+EMEIK IL KPDG+F+SDD+ 



Query: 


1 


Sbjct: 


1 


Strict: 


61 
61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


240 




301 


Sbjct: 


300 



TAIL MK+A QL+IT1P ++K+IGYDGT F++ Y P L TIRQPI +IA L V+IL+KKI 



KDYILP+ LL G S+ 
[PKDYILPVGLLSGESV 320 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 247/321 (76%) , Positives = 293/321 (90%) , Gaps = 1/321 (0%) 

^AKLTDVAALAGVSPTWSRVINKKGYLSQKTVTKVIQEAMRTLGYKPNNLARSLQGKSA 6 0 
+VAKLTDVAALRGVSPTTVSRVINKKGYLSQKTV KVN+AMR LGYKPNNLARSLQGKS 
WAKLTDVAALAGVSPTWSRVINKKGYLSQKTVNKVl^KAMRELGYKPNNLARSLQGKST 6 0 

KLIGLIFPNIRNIFYAELIEHLEIELFKHGYKTILCNSEKDPIKEKEYLEMLGANQVDGI 120 
+LIGLIFPNI NIFYAELIEHLEIELFK GYKTI+CNSE +P+KE+EYLEML ANQVDGI 
QLIGLIFPNISNIFYAELIEHLEIELFKQGYKTIICNSEHNPVKEREYLEMLAANQVDGI 120 

ISSSHNLGIDDYEKVEAPIVAFDRNIjAPHIPIVSSDNFFGGKMAAQTLICKHGCQKMIMIT 180 
ISSSHNLGI+DYE+VEaPIVAFDRNLAP+IP++SSDNF GGK+AAQTL+KHGCQ ++MIT 



GNDNSDSPTGLR+LGF+Y+ K S ++I + N LS +RREME+KS I + +T KPDG+F SDDL 



TA+L++K+ QL ++IPED+KVIGYDGT+FIQ YVP L TI+QPI EIA+L \ 



■ +KT+KDYILP++L+PG+S+ 





1 


Sbjct: 


1 


Query: 




Sbjct: 


61 




121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


240 


Sbjct: 


241 


Query: 


300 


Sbjct: 


301 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1160 

A DNA sequence (GBSxl236) was identified in S.agalactiae <SEQ ID 3599> which encodes the amino 
acid sequence <SEQ ID 3600>. This protein is predicted to be sucrose-6-phosphate hydrolase (cscA). 
Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 4775 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CA&35872 GB:X51507 sucrose -6 -phosphate hydrolase [Streptococcus mutans] 
Identities = 303/479 (63%), Positives = 359/479 (74%), Gaps = 25/479 (5%) 

MNLPTEIRYRPYDEWTEKDKENI VKNVSKSPWRATYHLEAKTGLLNDPNGFSYFNGKFHL 6 0 
MNLP IRYR Y +WTEE+ ++I NV+ SPW TYH+E KTGLLNDPNGFSYFNGKF+L 
MNLPQNIRYRRYQDWTEEEIKSIKTNVALSPWHTTYHIEPKTGLENDPNGFSYFNGKFNL S 0 

FYQbMPFGAMGLKQWVHTESDDLVHFKETGIKLKPDHVNDSHGAYSGSALAIDDKLFLF 120 
FYQNWPFGAWIGLK W+IITES +DLVHFKETG L PD +DSHGAYSGSA I D+LFLF 



YTGNVRD W R P QIGA+M G I KF VLI QPNDVTEHFRDPQIFNY QFYA+ 



IGAQNSKKCGFIKLYKALNNDIHIIWEFVGDLDFGGTGSEYMIECPNIIFVKGKPVLLYSP 240 
+GAQ+ LDFGG+ SEYMIECPN++F+ +PVL+YSP 

VGAQS LDFGGSKSEYMIECPNLVFINEQPVLIYSP 215 



Query: 


1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 


Sbjct: 


181 


Query: 


241 


Sbjct: 




Query: 


301 


Sbjct: 


27S 




361 


Sbjct: 


336 




421 


Sbjct: 


396 



VSWIGLPDIDYPSD +DYQGA+SLVKELS + K+G 1YQYPV A+++LR 



+LFA+ KG GL+IT+DT G ++IDRS+AG+QYA EFG+ R 



T +NIF+DKSIFEIFINKC-EKVFTGRVFP+ +Q+GI 4 



A related DNA sequence was identified in S.pyo genes <SEQ ID 3601> which encodes the amino acid 
sequence <SEQ ID 3602>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 462 9 (Affirmative) < suco 

bacterial membrane certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 28S/479 (60%) , Positives = 367/479 (76%) 

Query: 1 IWLPTEIRYRPYDEWTEEDKENIVKNVSKSPWRATYHLEAKTGLLNDPNGFSYFNGKFHL 60 

M+LP IRYRPY EW+ +D + I + +++SPW + +H4E KTGLLNDPNGFSYFNG++HL 
Sbjct: 2 MDLPQAIRYRPYKEWSSKDYQAITEKMAQSPWHSQFHVEPKTGLLNDPNGFSYFNGRYHL 61 

Query: 61 FYQHWPFGAAHGLKQWVHTESDDLVHFKETGIKLKPDHVKDSHGAYSGSAIAIDDKLFLF 12 0 

FYQNWP+GAAHGLKQWVH S DLVHF ET +L PDH +DSHGAYSGSA AIDDKLFLF 
Sbjct: 62 FYQNWPYGAAHGLKQWVHMTSTDLVHFTETRSRLLPDHAHDSHGAYSGSAYAIDDKLFLF 121 

Query: 121 YTGNTODMKWI^PRQIGAmTI^GKITKFDKVLISQPimvTEHFRDPQIFOTDNQFYAV 180 

YTGNVRD W R P Q+GAWM G I+K +VLI QP+DVTEHFRDPQ+F+Y QFYA+ 
Sbjct: 122 YTGNVRDANWVRTPLQVGAWMDKQGNI SKI PQVLIEQPDDVTEHFRDPQLFSYQGQFYAI 181 

Query: 181 IGAQNSIOCCGFIKLYICAIJJNDIHHWEFVGDLDFGGTGSEYMIECPNIIFVKGKPVLLYSP 240 

IGAQ G 1KLYKA++N + +W F+ DLDF +G+EYMIECPN++FV KPVL++SP 

Sbjct: 182 IGAQGLDGKGKIKLYKAVDNHTONWRFIADLDFDDSGTEYMIECPNLVFVDDKPVLIFSP 241 

Query: 241 QGLDKNELDYQNIYPNTYKIGQYFDANSSKIVEPSPIYNLDYGFEAYATQGFNTSDGRAF 300 

QGL K +LDYQNIYPNTYKI + F+ + +++ + NLD+GFEAYATQ F++ DGR 
Sbjat: 242 C^IAKADLDYQNIYPNTYKIFESFNPETGQLLGGGALQNLDFGFEAYATQAFSSPDGRVL 3 01 

Query: 301 IVSWIGLPDIDYPSDQFDYQGAMSLVKELSIKNGMLYQYPVPAMKNLRQHQAEFKTQLQT 360 

VSWIGLPDIDYP+D++DYQGA+SLVKEL IK+G LYQ PV A++NLR F ++ + 

Sbjct: 302 AVSWIGLPDIDYPTDRYDYQGALSLVKELRIKDGILYQTPVSALQNLRGPAELFHNKIDS 361 

Query: 361 ITOTYELELLVPRNDLSSFVIjFANPKGQGLSITIDTVKGKVIIDRSQAGQQYATEFGTSRQ 420 

+N YELEL 4P +LFA+ KG GL + +DT KG++ IDRS+AG QYA ++GT R 

Sbjct: 362 SNCYELELTIPGQKKLDLLLFADQKGNGLRLKVDTTKGQLSIDRSRAGVQYAQDYGTVRS 421 

Query: 421 CDIPKDATSINIFIDKSIFEIFINKGEKVFTGRWPDAEQSGIQLKEGHVHGKYFELKY 479 

C IP+ ++N+++D SI EIFIN+G+KV T RVFP Q+GIQ+ EG G Y+E++Y 
Sbjct: 422 CQIPQGHVTLNVYVDNSILEIFINQGQKVLTSRVFPTHGQTGIQWEGQAFGHYYEMRY 480 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1161 

A DNA sequence (GBSxl237) was identified in S.agalactiae <SEQ ID 3603> which encodes the amino 
acid sequence <SEQ ID 3604>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2204 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1162 

A DNA sequence (GBSxl238) was identified in S.agalactiae <SEQ ID 3605> which encodes the amino 
acid sequence <SEQ ID 3606>. Analysis of this protein sequence reveals the following: 



Possible site: 27 





have no N- terminal a 


Lgnal sequence 










INTEGRAL 


Likelihood = -7.54 


Transmembrane 


259 




250 


283 


INTEGRAL 


Likelihood = -4.41 


Transmembrane 


113 


129 


109 


130 




Likelihood = -3.03 


Transmembrane 


180 


196 


180 


196 


INTEGRAL 


Likelihood = -3.03 


Transmembrane 


439 


455 


438 


456 


INTEGRAL 


Likelihood = -2.81 


Transmembrane 


298 


314 


298 


317 


INTEGRAL 


Likelihood = -2.02 




396 


412 


395 


412 



Final Results 

bacterial membrane 
15 bacterial outside 

bacterial cytoplasm 



— Certainty=0. 4057 (Affirmative) < succ: 
--- Certainty=0.0000(Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC99320 GB:AF059741 sucrose-specif ic PTS permease [Clostridium 
beijerinckii] 

Identities = 235/453 (51%) , Positives = 312/453 (67%) , Gaps = 15/453 (3%) 



12 7 MGIRGAINNDTVLALFG , J , TSKAFSSSNFYTYTWLTDTAFAFFPALISWSAFR\ 

MG+RG + N V + NF 4-T VLTDTAFAF PAL++WS + FGG PV 

125 MGLRGLLTNLGVQM NENFVLFTQVLTDTAFAFLPALVAWSTMKKFGGTPV 174 

187 IGL VLGLMMVNSALPNAWA VASGDAHPI KF - - FGF - 1 P WGYQNS VLPAFFVGLLGAKLE 243 

IG+V+GLM+V+ +LPNA+AVA+G A PI G IPWGYQ SVLPA +G++ AK + 

175 IGIVIGLMLVSPSLPI^YAVAAGTATPINLTII^IjKIPVVGYQGSvLPALVLGIIAAKTQ 234 

244 KWLHKKIPDVLDLLLVPFLTFTVMSILALFVIGPIFHSVENYVIAGTKFVLNLPLGLSGL 303 

K L K +PDVLDL++ PF+T +L L ++GPI H+ E + K + LP GL GL 

235 KALKKVVPDVLDLIVTPFITLLFSMVLGLLIVGPIMHNAEQLIFGAIKGFMGLPFGLGGL 294 

304 ILGGvHQIIWTGVHHIFNLLEAQLIAADGXDPFNAIITAAMTAQAGATmVGVKTKNI^K 363 

++GGVHQ+IWIGVHH N LE +L+++ GKD FNA+IT + AQ A LAV VKTK+KK 
295 WGGWQLIvWGVHHALNALEVELLSSTGKDAFNAMITCGIVAQGAAALAVAVKTKDKK 354 

364 LKALAFPAALSAGLGITEPAIFGVNLRFGKPFIMGLIAGAAGGWLASILKLAGTGFGITI 423 

++L +A+ A LGITEPAIFGVNLRF KPFI G GA GG L+ IL LAGTG GIT 
355 KRSLYISSAIPAFLGITEPAIFGVKLRFIKPFIFGCAGGAVGGMLSGILHLAGTGMGITA 414 

424 IPGTLLYLNGQIVKYLIMVIGTTSLAFVLTYMF 456 

+PG LLY+N + Y+++ + ++AF LT F 
415 LPGMLLYVN-NLGSYILVNWAIAVAFCLTLFF 446 



55 A related DNA sequence was identified in S.pyogenes <SEQ ID 3607> which encodes the amino acid 
sequence <SEQ ID 3608>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 



Transmembrane 111 
Transmembrane 176 - 19: 
Transmembrane 436 - 45: 
Transmembrane 
Transmembrane 



127 I 



275 ; 
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INTEGRAL Likelihood = -0.43 Transmembrane 219 • 



Final Results 

bacterial membrane Certainty=0. 2996 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC99320 GB:AF059741 sucrose-specific PTS permease [Clostridium 
beijerinckii] 

Identities = 234/451 (51%) , Positives = 312/451 (68%) , Gaps = 11/451 (2%) 

Query: 1 MDNRQIAREVIEALG^RENTOSVAHffiTRLRVMVYDEGKIDKEKAEAIDKVKGAFFNSSQ 60 

M + +A E++E +GG+EN++SV HCATRLR+++ D+ KI+++ E ID VKG FF++ Q 
Sbjct: 1 MKEQIVAKEILENIGGKENIKSVEHCATRLRjlLNDKEKINEKAIENIDGVKGQFFSAAQ 60 

Query: 61 YQMIFGTGTVNNIYDEWALGLPTSSTSEQKAEAGKHGNIFQRAIRTFGDVFVPIIPAIV 120 

YQ+I GTG VN +YD +V T K EA + Q+ RTFGDVFVPIIP +V 

Sbjct: 61 YQIILGTGFVNEVYDVIVGQNSDLV-TGNNKEEAYSQMTLIQKISRTFGDVFVPIIPVLV 119 

Query: 121 ATGLFMGVRGLVTQPAIMDLFGVHEYGENFLMYTRILTDTAFVYLPALVAWSAFRVFGGN 180 

ATGLFMG+RGL+T + + ENF+++T++LTDTAF +LPALVAWS + FGG 

Sbjct: 120 ATGLFMGLRGLLTNLGV QMNENFVLFTQVLTDTAFAFLPALVAWSTMKKFGGT 172 

Query: 181 PIIGIVLGLMLVSNELPNAWWASGGDVK-PLTFFGF-VPWGYQGTVLPAFFVGLVGAK 238 

P+IGIV+GLMLVS LPNA+ VA+G LT G +PWGYQ3+VLPA +G++ AK 

Sbjct: 173 PVIGIVIGLMLVSPSLPNAYAVAAGTATPINLTILGLNIPWGYQGSVLPALVLGIIAAK 232 

Query: 233 LEKWLHKIWPEALDLLVTPFI/rFAIMSTLGLFVIGPvTB 298 

+K L K VP+ LDL+VTPF+T LGL ++GP+ H+ E L+ + + LPFG+ 

Sbjct: 233 TQKALKKWPDVLDLIVTPFITLLFSMVLGLLIVGPIMHNAEQLIFGAIKGFMGLPFGLG 292 

Query: 299 GLIVGGIO^LIVVTGIHHIFNFLEAQLlftNTGKDPFNAYLTAATAAQAGATLAVAVKTKS 358 

GL+VGG+ QLIWTG+HH N LE +L+++TGKD FNA +T AQ A LAVAVKTK 
Sbjct: 293 GLWGGVHQLIVVTGVHHAimLEVELLSSTGKDA™M:TCGIVAQGAAALAVAVKTKD 352 

Query: 359 TKLKGIAFESTLSALLGITEPAIFGVNLRYPKVOTSGLIGGALGGWVAGLFGIAGTGFGI 418 

K + L S + A LGITEPAIFGVNLR+ K F+ G GGA+GG ++G+ +AGTG GI 
Sbjct: 353 KKKRSLYISSAIPAFLGITEPAIFGVNLRFIKPFIFGCAGGAVGGMLSGILHLAGTGMGI 412 

Query: 419 TVLPGTLLYLNGQLLQYLVTMLVGLGVAFAI 449 

T LPG LLY+N L Y++ +V + VAF + 
Sbjct: 413 TALPGMLLYVN - NLGS YIL VNWAIAVAFCL 442 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 409/618 (66%), Positives = 491/618 (79%), Gaps = 12/618 (1%) 

Query: 4 NTEIAKQVINAIGGASNVRSVAHCATRLRVM\n^ET\TDKNTV^MIEKVQGAFFNSGQYQ 63 

N +IA +VI A+GG NVRSVAHCATRLRVMV DE IDK E 1 4 KV+GAFFNSGQYQ 
Sbjct: 3 NRQIAAEVIEALGGRENVRSVAHCATRLRVMVYDEGKIDKEKAEAIDKVKGAFFNSGQYQ 62 

Query: 64 IIFGTGTVNKIYDEWAQGLPTSSTSDQKAEAftKQGNAFQRAIRTFGDVFVPLLPAIVAT 123 

+ IFGTGTVN IYDEWA GLPTSSTS+QKAEA K GN FQRAIRTFGDVFVP+ + PAIVAT 
Sbjct: 63 MIFGTGTVNNIYDEWALGLPTSSTSEQKAEAGKHGNIFQRAIRTFGDVFVPIIPAIVAT 122 

Query: 124 GLFMGIRGAINNDTVIiALFGTTSKAFSSSNFYT^TVVLTDTAFAFFPALISWSAFRVFGG 183 

GLFMG+RG + ++ LFG NF YT +LTDTAF + PAL++WSAFRVFGG 

Sbjct: 123 GLFMGVRGLVTQPAIMDLFGVHEYG ENFLMYTRILTDTAFVYLPALVAWSAFRVFGG 179 

Query: 184 NPVIGLVLGLM^VNSALPNAWAVASG-DAHPIKFFGFIPVVGYQNSVLPAFFVGLLGAKL 242 

NP+IG+VLGLM+V++ LPNAW VASG D P+ FFGF+PWGYQ +VLPAFFVGL+GAKL 
Sbjct: 180 NPIIGIVLGLMLVSNELPNAWWASGGDVKPLTFFGFVPWGYQGTVLPAFFVGLVGAKL 239 



65 Query: 243 EKI^HKKIPDvLDLLLVPFLTFTVMSILALFVlGPIFHSVENYVLAGTKFV]m,PLGLSG 302 

EKWLHKK+P+ LDLL+ PFLTF +MS L LFVIGP+FHS+EN VLAGT+ VL+LP G++G 
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Sbjct: 240 EKOTLHK^PEMjDLLVTPFLTFAIMSTL3LFVIGPVFHSL3NLVLAC3TQAVLHLPFGIAG 299 





Sbjct: 


240 




Query: 


303 


5 


Sbjct: 


300 




Query: 


363 




Sbjct: 


360 


10 










423 




Sbjct: 


420 


15 


Query: 


483 




Sbjct: 


478 
543 


20 


Sbjct: 






Query: 


603 


25 


Sbjct: 


592 



++PGTLLYLNGQ+++YL+ 4+ +AF + Y +GY+D++ 



IjD+T M+IV+N AD+ V 



Based on this analysis, it was predicted that these proteins and their epitopes could be i 
vaccines or diagnostics. 



Example 1163 

30 A DNA sequence (GBSxl239) was identified in S.agalactlae <SEQ ID 3609> which encodes the amino 
acid sequence <SEQ ID 361 0>. This protein is predicted to be fructokinase. Analysis of this protein 
sequence reveals the following: 

Possible site: 18 

>» Seems to have no N-terminal signal sequence 

35 

Final Results 

bacterial cytoplasm Certainty=0 . 2436 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

40 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA02467 GB:D13175 fructokinase [Streptococcus mutans] 
Identities = 232/291 (79%), Positives = 257/291 (87%) 

45 Query: 1 MTKLYGSIFAGGTKFVCAVGDEELfCVVBKKQFPTTTPQSTIKKTVDFFKRFEKKLEAVAI 60 

M+KLYGSIFAGGTKFVCAVGDE +++EK+QFPTTTP ETI+KTV FFK+FE L +VAI 
Sbjct: 1 MSPa^YGSIEaGGTKFVCAVGDENFQILEKVQFr^TTTPYETIEKTVAFFKKFEADLASVAI 60 





61 


GSFGPIDIDKKSKTYGYITTTPKLHWANVDLLGLISKDFNVPFYFTTDVNSSAYGEVIAR 








GSFGPIDID+ S TYGYIT+TPK +WANVD +GLISKDF +PFYFTTDVNSSAYGE IAR 




Sbjct: 


61 


GSFGPIDIDQNSDTYGYITSTPKPNWANVDF^^GLISKDFKIPFYFTTDVNSSAYGETIAR 


120 




121 


NNIDSLVYYTIGTGIGAGAIQKGEFIGGTGHTEAGHTYMRMHPQDQANDFKGICPFHNSC 


180 






+N+ SLVYYTIGTGIGAGAIQ GEFIGG GHTEAGH YMA HP D + F G CPFH C 




Sbjct: 


121 


SNVTCSLVYYTIGTGIGAGAIQNGEFIGGMGHTEAGHVYMAPHPNDVHHGFVGTCPFHKGC 


180 


Query: 


181 


LEGMSGPTLEARTGIRGELIEFJ^SMVWDVOAlYTAQAAIQATVLYRPQVIVFGGGVMAQ 


240 






LEGIA+GP+LEARTGIRGELIE+NS VWD+QAYYIAQAAIQATVLYRPQVIVFGGGVMAQ 




Sbjct: 


181 


LEGLAAGPSLEARTGIRGELIEQNSFA^TDIQAYYIAQAAIQATVLYRPQVIVFGGGVMAQ 


240 




241 


EHMLRRVRQTFATLIJSIGYLPVPDLSDYIVTPAIEENGSATLGNFALAKKIS 291 
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EHML RVR+ P +LLN YLPVPD+ DYIVTPA+ ENGSATLGN AIAKKI + 
Sbjct: 241 EHMimWEKFTSLimiYLPVPDVKDYIVTPAVAENGSATLGI^LALAKKIA 291 

A related DNA sequence was identified in S.pyogenes <SEQ ID 361 1> which encodes the amino acid 
5 sequence <SEQ ID 3612>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 2012 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow. 

15 Identities = 212/293 (72%) , Positives = 246/293 (83%) 



Query: 1 MTKLYGSIEAGGTKFVCAVGDEELKVVEKMQFPTTTPQET^KKTVDFFKRFEKKLEAVAI 60 

M KLYGS I EAGGTKFVCAVGDEE W+K QFPTTTP+ETI +T+ +FK FE L +AI 
Sbjct: 1 MGKLYGSIEAGGTKFVCAVGDEEFTVVDKTQFPTTTPEETIARTIAYFKAFE7ADIAGMAI 60 



Query: 61 GSFGPIDIDK] 
GSFGPIDID 

Sbjct: 61 GSFGPIDrDPSSETYGYITTTPKSGWANVDLLGQLSAAFKIPFDVTTDVNSSAYGEVLAR 120 

Query: 121 NNIDSLVYYTIGTGIGAGAIQKGEFIGGTGHTEAGHTYMAMHPQDQANDFKGICPFHNSC 180 

++SLVYYTIGTGIGAGAIQ G FIGG GHTEAGHTY+ HP D A F G+CPFH C 
Sbjct: 121 PGVESLVYYTIGTGIGAGAIQHGHFIGGLGHTEAGHTYVMPHPDDMAKGFLGVCPFHKGC 180 



Query: 181 LEGIASGPTLEARTGIRGELIEENSMVWDVQAYYIAQAAIQATVLYRPQVIVFGGGVMAQ 240 

LEG+A+GP++EARTG+RGE +++ + VWD+QA+YIAQAA+QAT+LYRPQVIVFGGGVMAQ 
Sbjct: 181 LEGMAAGPSIEARTGWGERLDQEADVTOIQAFYIAQAALQATMLYRPQVIVFGGGVMRQ 240 

Query: 241 EHMLRRVRQTFATLLNGYI.PVPDLSDYIVTPAIEENGSATLGNFALAKKISKG 293 

EHM+ RV F LL+GYLPVPDL+DYIVTPA+ +NGSATLGNFALAK ++G 
Sbjct: 241 EHMVLRVHDKFTALLSGYLPVPDLTDYIVTPAVADNGSATLGNFAIjAKLAAQG 293 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1164 

A DNA sequence (GBSxl240) was identified in S.agalactiae <SEQ ID 3613> which encodes the amino 
acid sequence <SEQ ID 3614>. This protein is predicted to be Mannosephosphate Isomerase (pmi). 
Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4717 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA04021 GB:D16594 Mannosephosphate Isomerase [Streptococcus mutans] 
Identities = 232/312 (74%) , Positives = 262/312 (83%) 

Query: 1 MSEPLFLEASMHDKIWGGTKLRDEFGYDIPSETTGEYWAISA-IPNGVSRVKNGRFEGCFL 60 

M PLFL++ MH KIWGG +LR EFGYDI PSETTGEYWAI SAHPNGVS VKNG +KG L 
Sbjct: 1 MEGPLFLQSQMHKKIWGGNRLRKEFGYDIPSETTGEYWAISAHPNGVSWKNGVYKGVPL 60 
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Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sb j ct : 


121 


Query: 


181 


Sbj ct : 


181 




241 


Sb j ct : 


241 




301 


Sbjct: 


301 



I- LFGN 4VFPLLTKILDANDWLSVQVHPD4AYAL4HEGELGKTECWY+IS 



ETQQSSDTTYRVYDFDR D G4 R LHIEQSIDVLTIGKPAN PA + L+ L 4 



S+ FFTVYKW+ISG +Q APYLLVSVL G G ITV + Y L+KGDH ILPN 4 



A related DNA sequence was identified in S. pyogenes <SEQ ED 361 5> which encodes the e 
sequence <SEQ ID 3616>. Analysis of this protein sequence reveals the following: 

Possible site: 53 



<S- terminal signal ! 



Final Results 

bacterial cytoplasm Certainty=0 .3714 (Affirmative) < suoo 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suoo 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 232/312 (74%) , Positives = 264/312 (84%) 

Query: 1 MSEPLFLEASMHDKIWGGTKLRDEFGYDIPSETTGEYWAISAHPNGVSRVKNGRFKGCFL 60 

MSEPLFL+++MHD+IWGGTKLRD F Y+IPS+TTGEYWAISAHPNGVS V NGR++G L 
Sbjct: 1 MSEPLFLKSTMHDRIWGGTKLRDVFAYNIPSDTTGEYWAISAHPNGVSTVTNGRYQGQPL 60 

Query: 61 DKIYQGEKSLFGNPDDTVFPLLTKILDANDWIjSVQVHPDDAYALKHEGELGKTECWYIIS 120 

4 LY E +LFGNP 4 VFPLLTKILDANDWLSVQVHPDDAY 4HEGELGKTECWYIIS 
Sbjct: 61 NTLYAQEPALFGNPKEEVFPLLTKILDANDWLSVQvHPDDAYGREHEGELGKTECWYIIS 120 

Query: 121 ADEGSEIIYGHNAKTKEELRQMIESGDWEHLLTRIPVKSGDFYYVPSGTMHAIGKGILIL 180 

A+EGSEI+YGH AK4KE+LR MIE+G W+ LLTR+-PVK4GDF+YVPSGTMHAIGKGILIL 
Sbjct: 121 AEEGSEIOTGHCAKSKEDLRAMIEAGAVIDDLLTRVPVKAGDFFYVPSGTMHAIGKGILIL 180 

Query: 181 ETQQSSDTTYRvYDFDRPDASGKLRDLHIEQSIDvLTIGKPANTVPANMKLKHLSSTLLV 240 

ETQQSSDTTYRVYDFDR D 4G LRDLHIE+SIDVLTIGKP N+VPA M L ++ +T LV 
Sbjct: 181 ETQQSSDTTYRVYDFDRKTJWGNLRDLHIEKSIDVLTIGKPENSVPATMVLDNMVATTLV 240 

Query: 241 SNDFFTVYKWEISGVTNFKQFAPYLLVSVT.DGAGHITVDNKVYTLKKGDHFILPNDWEOT 300 

5 FFTVYKW S 4 + KQ APYLLVSVL G G 4 VD K Y L+KG HFILPNDV W 
Sbjct: 241 STPFFTVYKWVTSQITOMKQAAPYLLVSVLKGCGKLYVDQKAYELEKGMEFILPNDVKSW 300 

Query: 301 DIDGQLEIIASH 312 

DGQLE+I SH 
Sbjct: 301 SFDGQLEMIVSH 312 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines or diagnostics. 
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Example 1165 

A DNA sequence (GBSxl241) was identified in S.agalactiae <SEQ ID 3617> which encodes the amino 
acid sequence <SEQ ID 3618>. This protein is predicted to be preprotein translocase seca subunit (secA). 
Analysis of this protein sequence reveals the following: 

5 Possible site: 53 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1102 (Affirmative) < suco 

10 bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10107> which encodes amino acid sequence <SEQ ID 
10108> was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA50286 GB:L32090 secA [Listeria monocytogenes] 
Identities = 503/843 (59%) , Positives = 643/843 (75%) , Gaps = 16/843 (1%) 

Query: 11 MANILRTOIENDKGELKKLDKIAKKVDSYADHMAALSDEALQAKTPEFKERYQNGETLDQ 70 
20 MA +L+ + E+ K ++K L++ A ++ + AD AALSD+AL+ KT EFKER Q GETLD 

Sbjct: 1 MAGLLKKIFESGKKDVKYLERKADEIIAI^ETAALSDDALREKTVEFKERVQKGETLDD 60 

Query: 71 LLPEAFAWREASKRVLGLYPYHVQIMGGIVLHHGDIPEMRTGEGKTLTATMPVYLNAIS 130 
IiL EAFAV RE +KR LGLYP+ VQ+MGGIVLH +1 EM+TGEGKTLTAT+PVYLNA+S 
25 Sbjct: 61 LLVEAFAVAREGAKRALGLYPFKVQLMGGIVLHEDNIAEMKTGEGKTLTATLPVYLNALS 120 

Query: 131 GLGVHVITVNEYLSTRDATEMGEVYSV^GLSVGINIAAKSPFEKREAYNCDITYSTHAEV 190 

G GVHV+TVNEYL+ RDA EMG +Y++LGLSVG+NL A S EKREAY CDITYSTN E+ 
Sbjct: 121 GEGVHWTVNEYIAHRDAEEMGVLYNFM^ 180 

30 

Query: 191 GFDYLRDMWTOQEDWQRPIJ^ALVDEVDSVLIDFJ^TPLIVSGPVSSEMNQIiYTRADM 250 

GFDYLRDNMW +E+MVQRPL +A++DEVDS+L+DEARTPLI+SG + + LY RA+ 
Sbjct: 181 GFDYLRDNMVWKEEWQRPIAFAVIDEVDSILVDET^TPLIISGE-AEKSTILYVRANT 239 

35 Query: 251 FVTCTL-NSDDYIIDVPTKTIGLSDTGIDKAEm-FHLNiNrLYPLEWALTHYIDNALRANYI 309 

FV+TL +DY +D+ TK++ L++ G+ K ENYF + NL+DLEN + H+I AL+ANY 
Sbjct: 240 FTOTLTEEEDYTVDIKTKSVQLTEDGMTKGENYFDVENLFDLENWILHHIAQALKANYT 299 

Query: 310 MLLNIDYWSEEQEILIVDQFTGRTMEGRRFSDGIjHQAIEAKESVPIQEESKTSASITYQ 369 
40 M L++DYW ++ E+LIVDQFTGR M+GRRFS+GLHQA+EAKE V IQ ESKT A+IT+Q 

Sbjct: 300 MSLDVDYW-QDDEVLIVDQFTGRIMKGRRFSEGLHQALEAKEGVTIQNESKTMATITFQ 358 

Query: 370 1WFRMYHKIAGMTGTGKTEEEEFREIY1WVIPIPTKRPVQRIDHSDLLYPTLDSKFRAV 429 
N FRMY KLAGMTGT KTEEEEFR+ IYNMRVI IPTN+ + R D DL+Y T+++KF AV 
45 Sbjct: 359 ITYFRMYKKIAGMTGTAKTEEEEFRDIY]^VIEIFmKVIIRDDRPDLIYTTMEAKFNAV 418 

Query: 430 VADVICERYEQGQPVLVGWAVETSDLISRKLVAAGVPHEVLNAK^FKEAQIIMNAGQRG 489 

V D+ ER+ +GQPVLVGTVA+ +LIS KL G+ H+VLNAK H +EA II +AG+RG 
Sbjct: 419 VED IAERHAKGQPVliVGTVAMNI - EIjI SS KLKRKGI KHDVLNAKQHEREAD 1 1 KHAGERG 477 

50 

Query: 490 AVTIATNMAGRGTDIKLGEGVRELGGLCVIGTERHESRRIDNQLRGRSGRQGDPGESQFY 549 

AV I ATNMAGRGTD I KLGEG E GGL VIGTERHESRRIDNQLRGRSGRQGDPG +QFY 
Sbjct: 478 AWIATNMAGRGTDIKLGEGTIEAGGLAVIGTERHESRRIDNQLRGRSGRQGDPGVTQFY 537 



60 ° Uery ' 

Sbjct: 



610 QYDDVMREQREIIYANRREVITAERDLGPELKGMIKRTIKRAVDAHSRSDKNTAA---EA 666 

QYDDV+R+QRE+IY R EVI AE L ++ MI+RT+ V +++ S + A + 
596 QYDDVLRQQREVIYKQRYEVINAENSLREIIEQMIQRTVNFIVSSNASSHEPEEAWNLQG 655 
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Query: 667 IVNFARSALLDEEAITVSELRGLKEAE I KBLLYERALAVY3QQIAKLKDPEAI IEFQKVL 726 

+ LL E IT+ +L+ +1+ L+ ++ A Y+++ L PE EF+KV+ 

Sbjct: 656 IIDYVDflNLLPEGTITLEDLQNRTSEDIQNLILDKIKAAYDEK-ETLLPPEEFNEFEKW 714 

Query: 727 ILMWDNQWTEHIDALDQLRNSVGLRGyAQNNPIVEYQSEGFRMFQDMIGSIEFDVTRTL 786 

+L WD +W +HIDA+D LR+ + LR Y Q +P+ EYQSEGF MF+ M+ SI+ DV R + 
Sbjct: 715 LLRWDTKWVDHIDAMDHLRDGIHLRAyGQIDPLREYQSEGFEMFEAMVSSIDEDVARYI 774 

Query: 787 MKAQIHEQ-ERER-ASQHATTTAEQWISAQHVPMNNESPEYQGIKRNDKCPCGSGMKFKN 844 

MKA+I + ERE+ AAAE A+P++ QIRND CPCGSG K+KN 
Sbjct: 775 MKAEIRQNLEREQVAKGEAINPAEGKPEAKRQPIRKD QHIGRNDPCPCGSGKKYKN 830 

Query: 845 CHG 847 
CHG 

Sbjct: 831 CHG 833 

A related DTSfA sequence was identified in S. pyogenes <SEQ ID 361 9> which encodes the amino acid 
sequence <SEQ ID 3620>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4443 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 710/837 (84%), Positives = 777/837 (92%), Gaps = 3/837 (0%) 

Query: 11 MANILRWIENDKGELKKLDKIAKKVDSYADHMAALSDEALQAKTPEFKERYQNGETLDQ 70 

MANILR VI ENDKGEL+KL+ KI AKKV+SYAD MA+LSD LQ KT EFKERYQ GETL+Q 
Sbjct: 1 MANILRKVIENDKGELRKLEKIAKKVESYADQMASLSDRDLQGKTLEFKERYQKGETLEQ 60 

Query: 71 LLPEAFAWREASKRVLGLYPYHVQIMGGIVLHHGDIPEMRTGEGKTLTATMPVYLNAIS 130 

LLPEAFAWREA+KRVLGL+PY VQIMGGIVLH+GD+PEMRTGEGKTLTATMPVY1NAI + 
Sbjct: 61 LLPEAFAWREAAKRVLGLFPYRVQIMGGIVLENGDVPSMRTGEGKTLTATMPVYLNAIA 120 

Query: 131 GLGVHVITVNEYDSTRDATEMGEVYSWLGLSVGINLAAKSPFEKREAYNCDITYSTNAEV 190 

G GVHVITVNEYLSTRDATEMGEVYSWLGBSVGINLAAKSP EKREAYNCDITYSTN+EV 
Sbjct: 121 GEGVHVITVITOYLSTRDATEMGEVYSWLGLSVGINLAAKSPAEKREAYNCDITYSTNSEV 180 

Query: 191 GFDYLRDimVVRQEDfWQRPI^ALVDEVDSVLIDEARTPLIVSGPVSSEMNQLYTRADM 250 

GFDYLRDNMWRQEDMVQRPLN+ALVDEVDSVLIDEARTPLIVSG VSSE NQLY RADM 
Sbjct: 181 GFDYLRDNMWRQEDMVQRPLNFALVDEVDSVLIDEARTPLIVSGAVSSETNQLYIRADM 240 

Query: 251 FVKTLNSDDYIIDVPTKTIGLSDTGIDKAENYFHIJOTIjYDLEWALTHYIDI^FAIJYIM 310 

FVKTIi S DY+IDVPTKTIGLSD+GIDKAE+YF+L+NLYD+ENVALTH+IDNALPANYIM 
Sbjct: 241 FVKTLTSVDYVIDVPTKTIGLSDSGIDKAESYFNLSNLYDIENVALTHFIDNALRANYIM 300 

Query: 311 LLNIDYWSEEQEILIVDQFTGRTriEGRRFSDGLHQAIEAKESVPIQEESKTSASITYQN 370 

LL+IDYWSE+ EILIVDQFTGRTMEGRRFSDGLHQAIEAKE V IQEESKTSASITYQN 
Sbjct: 301 LLDIDYWSEDGEILIVDQFTGRTM3GRRFSDGLHQAIEAKEGVRIQEESKTSASITYQN 360 

Query: 371 MFRiynfHKLAGMTGTGKTEEEEFREIYNMRVI PI PTNRPVQRIDHSDLLYPTLDSKFRAW 430 

MFRMY KLAGMTGT KTEEEEFRE+YNMR+ 1 P I PTNRP+ RIDH+DLLYPTL+SKFRAW 
Sbjct: 3 61 MFPJWKIG^GMTGTAKTEEEEFREVYNMRIIPIPINRPIARIDHTDLLYPTLESKFRAVV 420 

Query: 431 ADVKERYEQGQPVLVGTVAVETSDLISRKLVAAGVPHEVLNAKNHFKEAQIIMNAGQRGA 490 

DVK R+ +GQP+LVGTVAVETSDLISRKLV AG+PHEVLNAKNHFKEAQIIMNAGQRGA 
Sbjct: 421 EDWTRHAKGQPILVGWAVETSDLISRKLVEAGIPHEVLI^AKNHFKEAQIIMNAGQRGA 480 



Query: 

VTIATDJMAGRGTDIKLGEGVRELGGLCVIGTERHESRRIDNQLRGRSGRQGDPGESQFYL 
Sbjct: 481 VTIATNMAGRGTDIKLGEGVRELGGLCVIGTERHESRRIDNQLRGRSGRQGDPGESQFYL 540 
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Sbjct: 541 SLEDDLMRRFGSDRIKAFLDRMKL1 E ED' 'I, 3 -" ijURQVESAQKRVEGNNIDTRKQVLQ 600 

Query: 611 YDDVMREQRE1IYANRREVITAERDLGPELKGMIKRTIKRAVDAHSRSDKNTAAEAIVNF 670 

YDDVMREQREIIYANRR+VITA RDLGPE+K MIKRTI RAVDAH+RS++ A +AIV F 
Sbjct: 601 YDDVMREQREIIYANRRDVITANRDLGPEIKAMIKRT1DRAVDAHARSNRKDAIDAIVTF 660 

Query: 671 ARSALLDEFAITVSELRGLKEAEIKELLYERALAVYEQQIAKIJKDPEAI1EFQKVLIL^^V 730 

AR++L+ EE 1+ ELRGLK+ +IKE LY+RAIA+Y+QQ++KL+D EAI IEFQKVLILM+ 
Sbjct: 661 ARTSLVPEEFISAKELRGLKDDQII<EKLYQRALAIYDQQLSKLRDQEAIIEFQKVLILMI 720 

Query: 731 VDNQWTEHIDALDQLRNSVGLRGYAQNNPIVSYQSEGFRMFQDMIGSIEFDVTRTLMKAQ 790 

VDN+WTEHIDALDQLRN+VGLRGYAQNNP+V3YQ+EGF+MFQDMIG+IEFDVTRT+^1KAQ 
Sbjct: 721 VDNKWTEHIDALDQIiRNAVGLRGYAQtlKPWEYQAEGFKMFQDMIGAIEFDVTRTMMKAQ 780 

Query: 791 IHEQERERASQHATTTAEQNISAQHVPMNNESPEYQGIKRNDKCPCGSGMKFKNCHG 847 

IHEQERERASQ ATT A QNI +Q ++ P+ ++RN+ CPCGSG KFKNCHG 
Sbjct: 781 1HEQERERASQRATTAAPQNIQSQQSANTDDLPK VERNEACPCGSGKKFKNCHG 834 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1166 

A DNA sequence (GBSxl242) was identified in S.agalactiae <SEQ ID 3621> which encodes the amino 
acid sequence <SEQ ID 3622>. This protein is predicted to be phospho-2-dehydro-3-deoxyheptonate 
aldolase (aroH). Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 . 3429 (Affirmative) < suco 

bacterial membrane Certainty= 0 . 0 0 0 0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF40753 GB:AE002387 phospho-2-dehydro-3-deoxyh.eptonate 

aldolase, phe- sensitive [Neisseria meningitidis MC58] 
Identities = 122/348 (35%), Positives = 187/348 (53%), Gaps = 32/348 (9%) 

MGFHQLSDKINIEILKQKTSLDLEVSQKKIiAKE EELKNII KGEDQRFLVI V 51 

M H +D I 1+ +K+ + + ++KE +E+ +++ G D+R LVI+ 

MTHHYPTDDIKIKEVKELLPPIAHLYELPISJCEASGLVHRTRQEISDLVHGRDKRLLVII 60 

GPCSADNPKAVIjTYAKRIJWLEAAFKDKI^FLVMRVYTAKPRTNGDGYKGLVHHSDKLGVF 111 
GPCS +PKA L YA+RL KL +++++ +VMRVY KPRT G+KGL++ G F 



Query: 


1 


Sbjct: 


1 




52 


Sbjct: 


61 




112 


Sbjct: 


120 


Query: 


166 


Sbjct: 


179 




221 


Sbjct: 


239 




281 



PVG KN T GNL++ +A+ AA + K V T GN HVILRG 



+++D +H NS K + 



! ++ES+L +GRQDKP+V+GKSITD C+GW 
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-1307- 

Sbjct: 292 E QDGGNIMGVMVESHLVEGRQDKPEVYGKSIIDACIGWGATEELL 336 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3623> which encodes the amino acid 
sequence <SEQ ID 3624>. Analysis of this protein sequence reveals the following: 

5 Possible site: 57 

•»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1171 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 52/233 (22%) , Positives = 93/233 (39%) , Gaps = 40/233 (17%) 

15 



Query. 


50 


IVGPCSADNPKAVLTYAKRLAKLEAAFKDKl'IFLVt'IRVyTAKPRTNGDGYKGLVHHSDKLG 


109 






IVGPCS ++ + A KL + 


R KPRT+ ++GL 




Sbjct: 


19 


IVGPCSIESYDHIRLAASSAKKLGYNY 


- -FRGGAYKPRTSAASFQGLG 


64 


Query: 


110 


VFFQARKMHYDI IRETGLLTADELLYPEMLSV 


MDDLVSYYA1GARSVEDQGHRF1SSGID 


169 






Q + 4++ +E GLL+ E++ L 


D + +GAR++++ S ID 




Sb j ct : 


65 


--LQGIRYLHEVCQEFGLLSVSEIMSERQLEH 


AYDYLDVIQVGARNMQNFEFLKTLSHID 


122 



Query: 170 APVGMKNPTSGNLRVMFNAVYAAQNQQELFYQNKQVRTDGNLLSHVIL--RGYHNADYRS 227 

P+ K + A+ Q+ + S++IL RG D 
Sbjct: 123 KPILFKRGLMATIEEYLGALSYLQDTGK SNIILCERGVRGYD 164 

Query: 228 I PNYHYENLLETITHYEETDLQNPFI WDTNHDNSGKQ- FLEQIRI VKS VLAD 279 

+ + +++ ++TDL I+VD +H + L +1 K+V A+ 
Sbjct: 165 VETRNMLD IMAVP I IQQKTDLP 1 IVDVSHSTGRRDLIiLPAWCIAKAVGAN 214 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1167 

A DNA sequence (GBSxl243) was identified in S.agalactiae <SEQ ID 3625> which encodes the amino 
acid sequence <SEQ ID 3626>. This protein is predicted to be AcpS (acpS). Analysis of this protein 
sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3620 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

?GP:AAG22706 GB:AF276617 acyl carrier protein synthase; AcpS 
[Streptococcus pneumoniae] 
Identities = 61/117 (52%) , Positives = 90/117 (76%) , Gaps = 1/117 (0%) 



55 Query: 61 YSKALGTGIGKVNFHDIEILSDDKGAPLITXEPFKGKSFVSISHSGNYAQASVILEE 117 

+SKA+GTGI K+ F D+E+L++44GAP +4 PF4GK 4+SISH4 4 ASVILEE 
Sbjct: 60 FSKAMGTGISKLGFQDLEVLKNERGAPYFSQAPFSGKIWLSISHTDQFVTASVILEE 116 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3627> which encodes the amino acid 
sequence <SEQ ID 3628>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2001 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 76/119 (63%), Positives = 99/119 (82%), Gaps = 1/119 (0%) 

Query: 1 MIVGHGIDLQEIEAITKAYERNQRFAERVLTEQELLLFKGISNPKRQMSFLTGRWAAKEA 60 

MIVGHGIDLQEI AI K Y+RN RFA+++LTEQEL +F+ KR++++L GRW+ KEA 

Sbjct: 1 MIVGHGIDLQEISAIEKVYQRNPRFAQKILTEQELAIFESFPY-KRRLNYLAGRWSGKEA 59 

Query: 61 YSKALGTGIGKVNFHDIEILSDDKGAPLITKEPFNGKSFVSISHSGNYAQASVILEEEK 119 

++KA+GTGIG++ F DIEIL+D +G P++TK PF G SF+SISHSGNY QASVILE++K 
Sbjct: 60 FAKAIGTGIGRLTFQDIEIDMDvRGCPILTKSPFKGNSFISISHSGNYVQASVILEDKK 118 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1168 

A DNA sequence (GBSxl244) was identified in S.agalactiae <SEQ ID 3629> which encodes the amino 
ac id sequence <SEQ ID 3630>. Analysis of this protein sequence reveals the following: 
Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Likelihood = --3.24 Transmembrane 78 - 94 ( 77 - 97) 



Final Results 

bacterial membrane — Certaxnty=0 . 2296 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 






241 


Sbjct: 


241 



GFCVSN+DEAIEIjRQAG+ K IL+LGV E V LAK + TLTVA LEW++ 



L+GL VH+K+DSGMGRIG R+ E + L + G V+GIFTHFATADE 



i+LES LVHVK + G+ +GYGATYQ ++ + TVPIGYADGWTRDMQ FSV+V+G+ 
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Query: 301 LCEIIGRVSMDQMTIRLPQKYTIGTKVTLIGQQGSCNITTTDVAQKRQTINyEVLCLLSD 360 

C I+GRVSMDQ+TIRLP+ Y +GTKVTLIG G IT T VI R TINYEV+ CLLSD 
Sbjct: 301 ACPIVGRVSMDQITIRLPKLYPLGTKVTLIGSNGDKEITATQVATyRVTINyEWCLLSD 360 

Query: 361 RIPRYY 366 

RIPR Y 
Sbjct: 361 RIPREY 366 

A related DNA sequence was identified in S.pyogenes <SEQ ID 363 1> which encodes the amino acid 
sequence <SEQ ID 3632>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.34 Transmembrane 82 - 98 ( 82 - 98) 



15 Final Results 

bacterial membrane Certainty=0. 1935 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the databases: 

>GP:AAD51027 GB:AF171873 alanine racemase [Streptococcus pneumoniae] 
Identities = 222/366 (60%) , Positives = 273/366 (73%) 

Query: 1 MISS FHRPTVARVNLQAI KENVASVQKHI PLGVKTYA WKADAYGHGAVQVSKALLPQVD 60 
25 M +S HRPT A ++L AI++N+ + HIP G AWKA+AYGHGAV V+KA+ VD 

Sbjct: 1 MKASPHRPTKALIHLGAIRQNIQQMGAHIPQGTLIOAWKANAYGHGAVAVAKAIQDDVD 60 

Query: 61 GYCTSNLDEALQLRCAGIDKEILILGVI&PNEI^ 120 
G+CVSN+DEA++LRQAG+ K ILILGV + LA T+T+A L+WI ++ + 

30 Sbjct: 61 GFCTSNIDEAIELRQAGLSKPILILGVSEIEAVALAKEYDFTL'rVAGLEWIQALLDKEVD 120 

Query: 121 CC^LKvHVKVDSGMGRIGLRSSKEVNLLIDSLKELGADVEGIFTHFATADEADDTKFNQQ 180 

GL VH+K+DSGMGRIG R + EV D L++ G VEGIFTHFATADE D FN Q 
Sbjct: 121 LTGLTVHLKIDSGMGRIGFRFASEVEQAQDLLQQHGVCVEGIFTHFATADEESDDYFNAQ 180 

35 

Query: 181 LQFFKKLIAGLEDKPRLVHASNSATSIWHSDTIFNAVRLGIVSYGLNPSGSDLSLPFPLQ 240 

L+ FK ++A +++ P LVHASNSAT++WH +TIFNAVR+G YGLNPSG+ h LP+ L 
Sbjct: 181 LERFKTIIASMKEVPELVHASNSATTLWHVETIFNAVRMGDAMYGLNPSGAVLDLPYDLI 240 

40 Query: 241 EALSLESSLVHVKMISAGDTVGYGATYTAKKSEYVGTVPIGYADGWTRNMQGFSVLVDGQ 300 

AL+LES+LVHVK + AG +GYGATY A +4- TVPIGYADGWTR+MQ FSVLVDGQ 
Sbjct: 241 PALTLESALVHVKTVPAGACMGYGATYQADS3QVIATVPIGYADGWTRDMQNFSVLVDGQ 300 

Query: 301 FCEIIGRVSMDQLTIRLPKAYPLGTKOTLIGSNQQKN1STTDIANYRNTINYEVLCLLSD 360 
45 C I+GRVSMDQ+TIRLPK YPLGTKVTLIGSN K 1+ T +A YR TINYEV+CLLSD 

Sbjct: 301 ACPIVGRVSMDQITIRDPKLYPLGTiCVTLIGSNGDKEITATQVATYRVTINYEWCLLSD 360 

Query: 361 RIPRIY 366 
RIPR Y 

50 Sbjct: 361 RIPREY 366 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 247/366 (67%) , Positives = 295/366 (80%) 

55 Query: 1 MISSYHRPTRALIDLEAIANNVKSVQEHIPSDKKTFAWKANAYGHGAVEVSKYIESIVD 60 

MISS+HRPT A ++L+AI NV SVQ+HIP KT+AWKA+AYGHGAV+VSK + VD 
Sbjct: 1 MISSFBIRPTVARVNLQAIKFJWASVQKHIPI^VKTYAvVKADAYGHGAVQVSKALLPQVD 60 

Query: 61 GFCVSNLDEAIELRQAGIVKMILW^GVVMPEQvILAK^ 120 
60 G+CVSNLDEA++LRQAGI K IL+LGV++P ++ LA IT+T+ASL+W+ L + + 

Sbjct: 61 GYOTSNLDFALQLRQAGIDKEILILGVLLPNEI^LAA/ANAITVTIASLDWIAIiARLEKKE 120 



Query: 121 LSGLEVHIKVDSGMGRIGVRQLDEGNKLISEL3ESGASVKGIFTHFATADEADNCKFNQQ 180 
GL+VH+KVDSGMGRIG+R E N LI L E GA V+GIFTHFATADEAD+ KFNQQ 
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Sbjct: 121 CQGLKOTVKVDSGMGRIGLRSSKEWILLIDSLICELGAD^GIFTHFATADEADDTKFNQQ 180 



L FFK I+GL++ P LVHASNSATS+WHS+TIFNAVRLG+V YGLNPSG+DL LP+P+ 



ALSLES LVHVK + G VGYGATY E+VGTVPIGYADGWTR+MQGFSV+V+G+ 



CEIIGRVSMDQ+TIRLP+ Y +GTKVTLIG NI+TTD+A R TINYEVLCLLSD 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1169 

A DNA sequence (GBSxl245) was identified in S.agalactiae <SEQ ID 3633> which encodes the amino 
acid sequence <SEQ ID 3634>. This protein is predicted to be immunogenic secreted protein precursor. 
Analysis of this protein sequence reveals the following: 
Possible site: 27 

>>> Seems to have a cleavable N-term signal seq. 





181 


Sbjct: 


181 


Query: 


241 


Sbj ct : 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbj ct : 


361 



• Final Results 

bacterial outside --- Certainty=0 .3000 (Affirmative) ■ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 



There is also homology to SEQ ID 1988. 

A related GBS gene <SEQ ID 8745> and protein <SEQ ID 8746> were also identified. Analysis of this 
protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: 8.81 
GvH: Signal Score (-7.5): 0.659999 

Possible site: 27 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 1.06 threshold: 0.0 
PERIPHERAL Likelihood = 1.06 247 
modified ALOM score: -0.71 

*** Reasoning Step: 3 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) ■ 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) 



SEQ ID 8746 (GBS98) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 5; MW 80kDa). 

GBS98-His was purified as shown in Figure 192, lane 9. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1170 

A DNA sequence (GBSxl246) was identified in S.agalactiae <SEQ ID 3635> which encodes the amino 
acid sequence <SEQ ID 3636>. This protein is predicted to be junction specific DNA helicase (mmsA) 
(recG). Analysis of this protein sequence reveals the following: 

Possible site: 17 



• Final Results 

bacterial membrane Certainty=0 . 1065 (Affirmative) 

bacterial outside Certainty=0 . 0000 (Not Clear) < 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA90280 GB:Z49988 MmsA [Streptococcus pneumoniae] 
Identities = 483/671 (71%) , Positives = 568/671 (83%) 

Query: 1 MLLQSPISNLKGFGPKSAEKFQKLDiyTVEDLLLYYPFRYEDFKSKSVFDLVDGEKAVIT 60 

ML P+ L G GPKSAEK+ KL I ++DLLLY+PFRYEDFK+K V +L DGEKAV++ 
Sbjct: 1 ^HQPLHVlPGVGPKSAEKYAKLGIENLQDLLIiYFPFRYEDFKTKQVLELEDGEKAVLS 60 

Query: 61 GLVVTPAOTQYYGFKRNRLSFKLRQGEAvLWSFFNQPYIjADKIELGQEVAVFGKJroA'EK 120 

G WTPA+VQYYGFKRNRL F L+QGE V V+FFNQPYLADKIELG +AVFGKWD K 
Sbjct: 61 GQVWPASVQYYGFKRNRLRFSLKQ^EWFAVNFFNQPYLM 120 

Query: 121 SAITGMKVLAQVEDDMQPVYRVAQGISQSTLIKAIKSAFEISAHLELKENLPATLLEKYR 180 

+++TGMKVLAQVEDD+QPVYR+AQGISQ-t-+L+K IK+AF+ L ++ENLP +I1I1+KY+ 
Sbjct: 121 ASLTGMKOTAQVEDDLQPVYRLAQGISQASLVKVIKTAFDQGLDLLIEENLPQSLIiDKYK 180 

Query: 181 IM3RSQACLAMHFPKDITEYKQALRRIKFEELFYFQMNLQVLKSENKSETNGLPILYSKH 240 

LM R QA AMHFPKD+ EYKQALRRIKF ELFYFQM LQ LKSEN+ + +GL + +S+ 
Sbjct: 181 LMSRCQAVRAMHFPKDI^YKQALRRIKFAELFYFQMQLQTLKSENRVC^SGLraWSQE 240 

Query: 241 AMETKISSLPFILTNAQI^SLDEILSDMSSGAHVINRLLQGDVGSGK'rVIAGLSMYAAyTA 300 

+ +SLPF LT AQ4-+SL EIL+DM S HMNRLLOGDVGSGKTV+AGL+M+AA TA 
Sbjct: 241 KVTAVKASLPFALTQAQEKSLQEILTDMKSDHH>1NRLLQGDVGSGKTVVAGLAMFAAVTA 300 

Query: 301 GFQSALMVmEIIiAEQHYISLQELFPDLSIAILTSGMKAAVI«TVIAAIAHGSVDMIVGT 360 

G+Q+ALMVPTEILAEQH+ SLQ LFP+L +A+LT +KAA KR VL IA G D+I+GT 
Sbjct: 301 GYQAALMVPTEILAEQHFESLQNLFPNLKLALLTGSLI<aAEKREVLETIAKGEADLIIGT 360 

Query: 361 HALIQDSVQYHKLGLVITDEQHRFGVKQRRI FREKGENPD VLMMTATPI PRTLAITAFGE 420 

HALIQD V+Y +LGL+I DEQHRFGV QRRI REKG+NPDVLMMTATP I PRTLAITAFG+ 
Sbjct: 361 HALIQDGVEYARLGLIIIDEQHRFGVGQRRILREKGDNPDVLI1MTATPIPRTLAITAFGD 420 

Query: 421 MDVSIIDELPAGRKPIITRWVKEEQLGTVLEWVKGELQKDAQVYVISPLIEESEAL.DLKN 480 

MDVS I ID++PAGRKPI +TRW+KHEQL VL W++GE+QK +Q YVISPLIEESEALDLKN 
Sbjct: 421 MDVSIIDQMPAGRKPIVTRWIKHEQLPQVLTWLEGEIQKGSQAYVISPLIEESEALDLKN 480 

Query: 481 AVALHAELSTYFEGIAKVALVHGFilKiroEKDAIMQDFKI)iaCSHILVSTTVIEVGVNVPNA 540 

A+AL EL+T+F G A+VAL+HGRMK+DEKD IMQDFK++K+ ILVSTTVIEVGVNVPNA 
Sbjct: 481 AIALSEELTTHFAGKAEVALLHGRMKSDEKDQIMQDFKERKTDILVSTTVIEVGVNVPNA 540 

Query: 541 TIMIIMDADRFGLSQLHQLRGRVGRGYKQSYAVLVANPKTDSGKKRMTIMTETTDGFVLA 600 

T+MIIMDADRFGLSQLHQLRGRVGRG KQSYAVLVAKPKTDSGK RM IMTETT+GFVLA 
Sbjct: 541 TVMIIMDADRFGLSQLHQIEGRVGRGDKQSYAVLTONPKTDSGKDRMRIMTETTNGFVLA 600 

Query: 601 ESDLKMRGSGEIFGTRQSGIPEFQVADIWDYPILEFJUIRVASDIVKDNKWKENTEWALI 660 

E DLKMRGSGEIFGTRQSG+PEFQVADI+ED+PILEEAR+VAS I W+E+ EW +1 

Sbjct: 601 E 
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Query: 661 LDNLRQHSDFD 671 

+Ii + D 
Sbjct: 661 ALHLEKKEHLD 671 



A related DNA sequence was identified in S. pyogenes <SEQ ID 363 7> which encodes the amino acid 
sequence <SEQ ID 3638>. Analysis of this protein sequence reveals the following: 



Possible site: 17 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 530 - 546 ( 530 



Final Results 

bacterial membrane Certainty=0 .1065 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < succ> 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 641/671 (95%) , Positives = 655/671 (97%) 

Query: 1 MLLQSPISNLKGFGPKSAEKFQKLDIYTVEDLLLYYPFRYEDFKSKSVFDLVDGEKAVIT 60 

M+L +P+SNLKGFGPKSAEKFQKLDIYTWDLLLYYPFRYEDFKBKSVFDLVDGEKAVIT 
Sbjct: 1 MILTAPMSNLKGFGPKSAEKFQKLDIYTVEDLLLYYPFRYEDFKSKSVFDLVDGEKAVIT 60 

Query: 61 GLWTPANVQYYGFICRNRLSFKLRCGFJOTjWSFTOQPYIiADKIELGQEVAVFGKWDATK 120 

GLVVTPANVQYYGFKl^LSFKLRQGEAVIjNVSFFNQPYLADKIELGQEVAVFGKMDATK 
Sbjct: 61 GLVVTPANVQYYGFKRNRLSFKLRQGFAVIOTSFFNQPYLADKIELGQEVAVFGKWDATK 120 

Query: 121 SAITGMKVIAQVEDDMQPVYRvAQGISQSTLIKAIKSAFEISAHLEIjKENLPATLLEKYR 180 

SAITGMKVLAQVEDDMQPVYRVAQGISQSTLIKAIKSAFEI AHLELKENLPATLLEKYR 
Sbjct: 121 SAITGMKVLAQVEDDMQPVYRVAQGISQSTLIKAIKSAFEIDAHLELKENLPATLLEKYR 180 

Query: 181 LMGRSQACIaAMHFPKDITEYKQALRRIKFEELFYFQMNLQVLKSENKSETNGLPILYSKH 240 

LMGRSQACLAMHFPKDITEYKQALRRIKFEELFYFQMNLQVLK+ENKSETNGLPILYSK 
Sbjct: 181 LMGRSQACLAMHFPKDITEYKQALRRIKFEELFYFQMNLQVLKAENKSETNGLPILYSKR 240 

Query: 241 AMETKISSLPFILTNAQKRSLDEILSDMSSGAHMNRLLQGDVGSGKTVIAGLSMYAAYTA 300 

AMETKISSLPFILTNAQKRSLD+ILSDMSSGAHMNRLLQGDVGSGKTVIAGLSMYAAYTA 
Sbjct: 241 AMETKISSLPFILTNAQKRSLDDILSDMSSGAHMNRLLQGDVGSGKTVIAGLSMYAAYTA 300 

Query: 301 GFQSALMVPTEII^QHYISLQELFPDLSIAILTSGMKAAVKRTVLAAIANGSVXIMIVGT 360 

GFQSAL>WPTEILAEQHYISLQELFPDLSIAILTSGMKAAVKRTVIjAAIANGSVDMIVGT 
Sbjct: 301 GFQSALMVPTE I LAEQHYI SLQELFPDLS I AI LTSGMKAAVKRTVLAAIANGSVDMI VGT 360 

Query: 361 HALIQDSVQYHKLGLVITDEQHRFGVKQRRIFREKGENPDVLMMTATPIPRTLAITAFGE 420 

HALIQDSVQYHKLGLVITDEQHRFGVKQRRIFREKGENPDVLI1MTATPIPRTLAITAFGE 
Sbjct: 361 HALIQDSVQYHKLGLVITDEQHRFGVKQRRIFREKGENPDVLMMTATPIPRTLAITAFGE 420 

Query: 421 MDVSIIDELPAGRKPIITRWVKHEQLGTVLEWVKGELQKDAQVYVISPLIEESEALDLKN 460 

MDVSIIDELPAGRKPI+TRWKHEQLGTVLEWVKGELQKDAQVYVISPLIEESEALDLKN 
Sbjct: 421 MDVSIIDELPAGRKPItWRWVKHEQLGTVLEWVKGELQKDAQVYVISPLIEESEALDLKN 480 

Query: 481 AVALHAELSTYFEGIAKVALVHGRMKNDEKDAIMQDFKDKKSHILVSTTVIEVGVNVPNA 540 

AVALHAELSTYFEGIAKVALVHGRMKNDEKDAIMQDFKDKKSHILVSTTVIEVGVNVPNA 
Sbjct: 481 AVALHAELSTYFEGIAKVALVHGRMraiDEKDAIMQDFKDKKSHILVSTTVIEVGVNVPNA 540 

Query: 541 TIMIIMDADRFGLSQLHQLRGRVGRGYKQSYAVLVANPKTDSGKKRMTIMTETTDGFVXA 600 

TIMIIMDADRFGLSQLHQLRGRVGRGYKQSYAVLVANPKTDSGKKRMTIMTETTDGFVLA 
Sbjct: 541 TIMIIMDADRFGLSQLHQLRGRVGRGYKQSYAVXVANPKTDSGKKRMTIMTETTDGFVLA 600 

Query: 601 ESDLKMRGSGEIFGTRQSGIPEFQVADIVEDYPILEEARRVASDIVKDNNWKENTEWALI 660 

ESDLKMRGSGEIFGTRQSGIPEFQVADIVEDYPILEEAR+V++ IV D NW +W L+ 
Sbjct: 601 ESDLKMRGSGEIFGTRQSGIPEFQVADIVEDYPILEEARKVSAAIVSDPNWIYEKQWQLV 660 



Query: 661 LDNLRQHSDFD 671 
65 N+R+ +D 
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Sbjct: 661 AQNIRKKEVYD 671 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1171 

A DNA sequence (GBSxl247) was identified in S.agalactiae <SEQ ID 3639> which encodes the amino 
acid sequence <SEQ ID 3640>. This protein is predicted to be aryl-alcohol dehydrogenase (bl647). 
Analysis of this protein sequence reveals the following: 

Possible site: 50 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1562 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10105> which encodes amino acid sequence <SEQ ID 
101 06> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 7 IGQTGIQATKIALGt'I 1PI l.i. ] WGT 'U^DLGINFFDHADIYGGGLSELRFRDAI 66 

+G + ++ +A+GCMR++ + K+AE V TAL+ G NFFDHADIYGGG E F DAI 
Sbjct: 6 LGSSSLEVPWAVGCMRINAISKKEAERFVQTALEQGANFFDHADIYC-GGECEEIFADAI 65 

Query: 67 KHLNVNRDKMIIQSKCGIREGYFDFSKEYILSSVDGILERLGTEYLDFLILHRPDVLVEP 126 

+ R+K+ 1 +QSKCGIREG FDFSKEYIL SVDGIIi+RL T+YLD L+LHRPD LVEP 

Sbjct: 66 QMNEAVREKIILQSKCG1REGRFDFSKEYILQSVDGILQRLKTDYLDVLLLHRPDALVEP 125 

Query: 127 EEVAEAFTKLRAEGKVKHFGVSNQNRFQMELLQSYLDEPLAVNQLQLSPAHTPMFDAGLN 186 

EEVAEAF L + GKV+HFGVSNQN Q+EIiL+ ++ +P+ NQLQLS + M +G+N 
Sbjct: 126 EEVAEAFDLLESSGKVRHFGVSNQNPMQIELLKKFVRQPIVANQLQLSITNATMISSGIN 185 

Query: 187 VOT^LNKASIEHDDGIVDYCRLKRVTIQAWSPFQIDLSRGLFVNHPDYI^ELNETIAKLAKN 246 

VNM N+++I D ++DYCRL VTIQ WSPFQ G+F+ + + EEN+ I 

Sbjct: 186 VNMENESAINRDGS^TLDYCRLHDVTIQPWSPFQYGFFEGVFLGNDLFPELNKKIDELAEK 245 

Query: 247 YNVSSEA1VIAW1LRHPAKMQA1VGSMNPSRLKAIDKANDIALTRKEWYDIYRSAGNILP 306 

Y VS+ I IAW+LRHPA MQ ++G4MN Rt,K KA++I LTR+EWY+IYR+AGNILP 
Sbjct: 246 YEVSNTTIAIAV^ J LRHPANMQPVIGT^INLKRLKDCCKASEIRLTREEWYEIYRAAGNILP 305 

There is also homology to SEQ ID 780. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1172 

A DNA sequence (GBSxl248) was identified in S.agalactiae <SEQ ID 3641> which encodes the amino 
acid sequence <SEQ ID 3642>. This protein is predicted to be shikimate 5-dehydrogenase (aroE) (aroE). 
Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 
- Final Results 
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bacterial cytoplasm — Certainty=0 . 0988 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



5 The protein has homology with the following sequences in the GENPEPT database, 

>GP:AAC74762 GB:AE000264 putative oxidoreductase [Escherichia coli K12] 
Identities = 114/279 (40%), Positives = 171/279 (60%), Gaps = 3/279 (1%) 

Query: 10 LTGLIANPARHSLS PLMWNTS FQEKNMNYAYLT FE VEEGKLTEAWGVRALGIRGVNVSM 69 
10 h Gh+h P RHSLSP M N + ++ + + Y+ FEV+ A+ G++AL +RG VSM 

Sbjct: 9 LIGLMAYPIRHSLSPEMQNKALEKAGLPFTYMAFEVDNDSFPGAIEGLKALKMRGTGVSM 68 

Query: 70 PFKQSVIPLLDDLSPQAKLVGAVNTIWQGGTGRLVGHMTDGIGCFKALAAQGFSAKNKI 129 
P KQ +D+L+P AKLVGA+NTIVN G R G+ TDG G +A+ GF K K 

15 Sbjct: 69 PNKQLACEYVDELTPAAKLVGAINTIVNDDGYLR--GYNTDG1GHIRAIKESGFDIKGKT 126 



Query: 130 ITIAGIGGSGKAVAVQAAMEGVAEIRLFNRNSSNYDKVIDLSDKIKKQFQIKVVVDYLEN 189 

+ + G GG+ A+ Q A+EG4 EI+LFNR +DK +++++ VVL + 

Sbjct: 127 MVIjLGAGGASTAIGAQGAIEGLKEIKLFNRRDEFFDKAIAFAQRVNENTDCVVTvTDIiM 186 

Query: 190 KTAFKDAIRTSHFYIDATSLGMRPLDNYSLINDPEILTPNLWVDLVYKPKETALLRFVR 249 

+ AF +A+ ++ + T +GM+PL+N SL+ND +L P L+V + VY P T LL+ + 
Sbjct: 187 QQAFAEALASADILTNGTKVGMKPLENESLVNDISLLHPGLLVTECVYNPHMTKLLQQAQ 246 

Query: 250 QNGVKHAYNGLGMLI YQGAEAFQLITNQEMP I SSVERVIi 288 

Q G K +G GML++QGAE F L T ++ P+ V++V+ 
Sbjct: 247 QAGCK-TIDGYGMLLWQGAEQFTLWTGKDFPLEYVKQVM 284 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3643> which encodes the amino acid 
30 sequence <SEQ ID 3644>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
»> Seems to have an uncleavable N-term signal seg 

Final Results 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

40 >GP:AAC74762 GB:AE000264 putative oxidoreductase [Escherichia coli] 

Identities = 132/280 (47%) , Positives = 186/280 (56%) , Gaps = 3/280 (1%) 

Query: 11 LVSLLATPIRHSLSPKMHNEAYAKLGLDYAYLAFEVGTEQIADAVQGIRALGIRGSNVSM 70 
L+ L+A PIRHSLSP+M N+A K GL + Y+AFEV + A++G++AL +RG+ VSM 
45 Sbjct: 9 LIGLMAYPIRHSLSPEMQNKALEKAGLPFTYMAFEVDNDSFPGAIEGLKALKMRGTGVSM 68 



Query: 71 PNKEAILPLLDDLSPAAELVGAVNTVVNKEGKGHLVGHITD3IGALRALADEGVSVKNKI 130 

PNK+ +D+L+PAA+LVGA+NT+VN DG +L G+ TDG G +RA+ + G +K K 

Sbjct: 69 PNKQtACEYVDELTPAAKLVGAINTIVNDDG--YLRGYNTDGTGHIRAIKESGFDIKGiCr 126 

Query: 131 ITIAGVGGAGKAIAVQIAFDGAKETOLFNRQATRLSSVQKLVTKMQLTRTKVTLQDLED 190 

+ L G GGA AI Q A +G KE++LFNR+ ++N+ T VT+ DL D 

Sbjct: 127 MVLLGAGGASTAIGAQGAIEGLKEIKLFNRRDEFFDKAIAFAQRVNENTDCVVTVTDLAD 186 

Query: 191 QTAFKEAIRESHLFIDATSVGMKPLENLSLITDPELIRPDLWFDIVYSPAETKLLAFAR 250 

Q AF EA+ + + + T VGMKPLEN SL+ D h+ P L+V + VY+P TKLL A+ 
Sbjct: 187 QQAFAEAIASADILTNGTKVGMKPLFjNESLVITOISLLHPGLbVTECVYNPHMTKLLQQAQ 246 

Query: 251 QHGAQKVINGLGMVIiYQjSAEAFKLITGQDMPVDAIKPLLG 290 

Q G K I+G GM+L+QGAE F L TG+D P++ +K ++G 
Sbjct: 247 QAGC-KTIDGYGMLLWQGAEQFTLWTGKDFPLEYVKQVMG 285 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 166/288 (57%) , Positives = 221/288 (76%) 

Query: 4 IiNGETLLTGIiIifflPJUlHSLSPLMWNTSFQEKNMOTAYLTFEVEEGKLTEJWRGVRALGIR 63 

L+G TLL L+A P RHSLSP M N ++ + 44YAYL FEV 4L +AV+G+RALGIR 
Sbjct: 5 LSGHTLLVSLIATPIRHSLSPKMHNEAYAKLGLDYAYLAFEVGTEQLADAVQGIRALGIR 64 

Query: 64 GVWSMPFKQSVIPLLDDLSPQAKLVGAVlSrriWQGGTGRLVGHMTDGIGCFKALARQGF 123 

G NVSMP K4444PLLDDLSP A4LVGAVNT+VN+ G G LVGH4TDGIG 4ALA 4G 
Sbjct: 65 GSm/SMPNKEAILPLLDDLSPAAELVGAvNTVWKMKGHLVGHITDGIGMRAIADEGV 124 

Query: 124 SAKNKI1TIAGIGGSGKAVAVQAAMEGVAEIRLFMPJISSKIYDKVIDLSDKIKKQFQIKVV 183 

S KNK1IT+AG+GG+GKA+AVQ A +G E4RLFNR +4 V L K4 4 4 KV 
Sbjct: 125 SVKNKIlTIAGVGGAGKAIAVQIAFDGAKEVRLFiffiQATRLSSVQKLVTKIjNQLTRTKVT 184 

Query: 184 VDYLENKTAFKDAIRTSHFYIDATSLGMRPLDNYSLINDPEILTPNLVWDLVYKPKETA 243 

4 LE44TAFK4AIR SH 4IDATS+GM+PL4N SLI DPE44 P4LW D4VY P ET 
Sbjct: 185 LQDLEDQTAFKEAIRESHLFIDATSVGMKPLBNLSLITDPELIRPDLWFDIVYSPAETK 244 

Query: 244 LLRFVRQNGVKHAYNGLGMLIYQGAEAFQLITNQEMPISSVERVLQTE 291 

LL F RQ4G 4 NGLGM44YQGAEAF4LIT Q4MP4 444 4L E 
Sbjct: 245 LLAFARQHGAQKVINGLGMVLYQGAEAFKLITGQDMPVDAIKPLLGDE 292 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1173 

A DNA sequence (GBSxl249) was identified in S.agalactiae <SEQ ID 3645> which encodes the amino 
acid sequence <SEQ ID 3646>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.16 Transmembrane 57 - 73 ( 53 - 76) 

Final Results 

bacterial membrane Certainty=0 . 3463 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1174 

A DNA sequence (GBSxl250) was identified in S.agalactiae <SEQ ID 3647> which encodes the amino 
acid sequence <SEQ ID 3648>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2333 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10103> which encodes amino acid sequence <SEQ ID 
10104> was also identified. 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05343 GB : AP001512 L-asparaginase [Bacillus halodurans] 
Identities = 158/319 (49%), Positives = 214/319 (65%), Gaps = 4/319 (1%) 

MKKIIiVLHTGGTISMNfllTOKGQVMSSADNPMKYVDLSLDDli-DLTVVDFLNIjPSPQITPH 59 
MKK+LV+HTGGTI+M+ +EKG V NP+ SL + + V DFLN+PSP +TP 

MKKVLVIHTGGTIAMHEDEKGGVQPKETNPLFATVESL'TSIASIEVDDFLNIPSPHMTPE 60 

HMLDIYHYLKQHASN- - FDGWITHG7DTLEETAYFLDTMI LPKIPI I ITGAMRSTNELG 117 

M + LK N FDGWITHGTDTLEETAY LD ++ ++P+++TGAMRS+NELG 
LMFQLAERLKSRVGNESFDGWITHGTDTLEETAYLLDLLLDWEVPVWTGAMRSSNELG 120 

SDGVYNYLSALRVANSTKAADKGVIiVvMHDEIHAAKYVTKTHTraVSTFQTPTHGPLGI I 177 
+DG +N++SA++ A + +A KGVLW NDEIH AK VTKTHT+NV+TFQ+P +GP+GI+ 



/ ++XAYAGM D 4-+ + I GLVIEA G G 



RLKLLIAUIAGLTGQNLKD 316 
RLKLL+AL + L++ 

RLKLLVALELTTDRKKLQE 318 



















Query. 




Sb j ct : 


121 




178 


Sbjct: 


181 




238 


Sb j ct : 


240 




298 


Sbjct: 


300 



A related DNA sequence was identified in S. pyogenes <SEQ ID 3649> which encodes the a 

30 sequence <SEQ ID 3650>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
>>> Seems to have no N-terminal signal sequence 

Likelihood = -2.28 Transmembrane 245 - 261 ( 243 - 261) 



35 Final Results 

bacterial membrane Certainty=0 . 1914 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

>GP:BAB05343 GB:AP001512 L- asparaginase [Bacillus halodurans] 
Identities = 158/320 (49%) , Positives = 218/320 (67%) , Gaps = 5/320 (1%) 

MKKILV1HTGGTISMQADNSGRWPNQDNPM-TKIHAAAQDIQLTVSDFLNLPSPHITPH 5 9 
MKK+LV+HTGGTI+M D G V P + NP+ + + + V DFLN+PSPH+TP 

MKKA/LVIHTGGTIAMHEDEKGGVQPKETNPLFATVESLT3IASIEVDDFLNIPSPHMTPE 6 0 



G+DG +N+++A++ A++D+AK KGVLW NDEIK AK VTKTHT+N++TFQ+P +GP+GI 





1 


Sbjct: 


1 




60 


Sbjct: 






118 


Sbjct: 


120 




178 


Sbjct: 


180 


Query: 


238 


Sbjct: 


239 



- ++KAYAGM DGS++ 



IPV+LVSR +G+ + YAYEGGG L++ GV+F LN K 



Query: 



298 ARLKLLIALNAGLTGQELKD 317 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 242/321 (75%), Positives = 275/321 (85%), Gaps = 1/321 (0%) 

Query: 1 MKKILVLHTGGTIS^ANEKGQVMSSADNPIV^^ 60 

MKKILVLHTGGTISM A+ G+V+ + DNPM + + D+ LTV DFLNLPSP ITPHH 
Sbjct: 1 MKKILVLHTGGTISMQADNSGRWPNQDNPMTKIHAAAQDIQLTVSDFLNLPSPHITPHH 60 

Query: 61 MLDIYHYLKQHASNFDGWITHGTDTLEETAYFLDTMILP-KIPIIITGAMRSTNELGSD 119 

ML IYH++++ FDG+VITHGTDTLEETAYFLDTM LP IP+++TGAMRS+NE+GSD 
Sbjct: 61 MLSIYHHIQERTDVFDGIVITHGTDTLEETAYFLDTMALPTNIPWLTGAMRSSNEVGSD 120 

Query: 120 GvYNYLSALRVANSTKAADKGVLWMtTOEIH^ 179 

G+YNYL+ALRVA+S KA +KGVLWMNDEIHAAKYVTKTHTTN+STFQTPTHGPLGIIMK 
Sbjct: 121 GIYI^LTALRVASSDKAKEKGVLVVI^EIHAA.KYVTKTHTTNISTFQTPTHGPLGIIMK 180 

Query: 180 QDLLFFKATEERVRFDLDKITGTVPIVKAYAGMGDSGIISFLNSQNISGLVIEALGAGNM 239 

DLLFFK E R+RFDL I+GT+PI+KAYAGMGD I+S L +1 GLVIEALGAGN+ 
Sbjct: 181 NDLLFFKTAEPRIRFDLRCISGTIPIIKAYAGMGDGSILSLLTPGSIQGLVIEALGAGNV 240 

Query: 240 PPKAAQEIEELIEQGVPVvLVSRCFNGIAEPVYGYEGGGAKLQESGVMFVKELNAPKARL 299 

PP A EIE LI G+PV+LVSRCFNG+AEPVY YEGGGA LQE+GVMFVKELNAPKARL 
Sbjct: 241 PPl^VGEIEHLIALGIPVILVSRCFKGMAEPVYAYEGGGAMLQEAGVMFVICELNAPKARL 300 

Query: 300 KLLIALNAGLTGQNLKDYIEG 320 

KLLIALNAGLTGQ LKDYIEG 
Sbjct: 301 KLLIALNAGLTGQELKDYIEG 321 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1175 

A DNA sequence (GBSxl251) was identified in S.agalactiae <SEQ ID 3651> which encodes the amino 
acid sequence <SEQ ID 3652>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 442 7 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB85142 GB:AL162757 conserved hypothetical protein [Neisseria 
meningitidis Z2491] 
Identities = 87/285 (30%), Positives = 138/285 (47%), Gaps = 35/285 (12%) 

Query: 4 KAVFFDIDGTLLNDRKNVQKSTIK-AIRNLKDQGILVGLATGRG PSFVQPFLENLG 58 

K VFFDID TL + + ++K A+ L+ +GIL IATGR P V+ + G 

Sbjct: 11 KIVFFDIDDTLYRKYTDTLRPSWTAVAALRGKGILTALATGRSIATIPEKVRDMMAETG 70 

Query: 59 LDFAvTYNGQYIYSRSEIIYTNQLSKTTVYRLIRYAGARRREISLGTASGLLGSGIIGLG 118 

+D VT NGQ+ + + + + R+ + SLG +G G+ 

Sbjct: 71 MDAWTINGQFALLHGKTVCEVPMDAGLMGRVCAHLD SLGMDYAFVGGE- -GIA 122 

Query: 119 TSRLGQIVSSLVPRKWAKAIERSFKHFIRRIKPQNIDSLMVILREPIYQWLVATEGE-- 176 

S L + V R+ KH I +P+YQ+++ A E E 

Sbjct: 123 VSALSECVC RALKH IASDFFADKDYFSSKPVYQMLVFAEENEMP 166 

Query: 177 --SERIQKQFPRVKLTRSSPYSMDVISEGQSICVKSIERVGQRYGFDLSEVIAFGDSDNDI 234 
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S+ ++++ +K R ++D++ G SK GI V + G ++++V+AFGD ND+ 
Sbjct: 167 LWSDIVERE--GLKTVRWHEEAVDLLPAGASKTIX3IRSVVEALGLEMOTVMAFGDGliNDV 224 

Query: 235 EMLSQVGIGVAMGNASQQVRENARYTTADNNDDGISKALAHYGLI 279 
5 EMLS+VG GVAMGN Q +E A+Y ++DG+ + L G+I 

Sbjct: 225 EMLSEVGFGVAMGNGEQAAKEAAKYVCPGVDED3VLRGLQDLGVI 269 

A related DNA sequence was identified in S.pyogenes <SEQ ID 365 3> which encodes the amino acid 
sequence <SEQ ID 3654>. Analysis of this protein sequence reveals the following: 

10 Possible site: 45 

>>> Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 6014 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 320/459 (69%), Positives = 391/459 (84%) 

Query: 1 MAIKAVFFD1DGTLLNDRKNVQKSTIKAIRNLKDQGILVGLATGRGPSFVQPFLENLGLD 60 

+ +KAVFFDIDGTLLNDRKN+QK+T KAI+ LK QG1 +VGIATGRGP FVQPFLEN GLD 
Sbjct: 1 LTVKAVFFDIDGTLLNDRKNIQKTTQKAIQQLKKQGIMVG1ATGRGPGFVQPFLENFGLD 60 

Query: 61 FAVTYNGQYIYSRSEI I YTNQLSKTTVYRLIRYAGARRREI SLGTASGLLGSGI IGLGTS 120 

FAVTYNGQYI +R +++Y NQJj K+ +Y++IRYA ++RE1SLGTASGL GS II +GTS 
Sbjct: 61 FAVTYNGQYILTRDKVLYQNQLPKSMIYIO/IRYANEKKREISLGTASGLaGSRIIDMGTS 120 

Query: 121 RLGQIVSSLVPRKWAKAIERSFKHFIRRIKPQNIDSLMVILREPIYQWLVATEGESERI 180 

GQ++SS VP+ WA+ +E SFKH IRRIKPQ+ +L+ I+REPIYQWLVA++ E+++I 
Sbjct: 121 PFGQVISSFVPKSWARTVEGSFKHLIRRIKPQSFRNLVTIMREPIYQVVLVASQAETKKI 180 

Query: 181 QKQFPRVKLTRSSPYSMDVISEGQSKVKGIERVGQRYGFDLSEVIAFGDSDNDIEMLSQV 240 

Q++FP +K+TRSSPYS+D+IS QSK+KGI3R+G+ +GFDLSEV+AFGDSDND+EMLS V 
Sbjct: 181 QEKFPHIKITRSSPYSLDLISVDQSKIKGIERLGEMFGFDLSEVMAFGDSDNDLEMLSGV 240 



Query: 241 GIGVAMGNASQQWENARYTTADNNDDGISKALAHYGLIQFEIEKTFSSRDENFNKVKSF 300 

GIG+AMGNA V++ A +TT NN+DGISKALAHYGLI F+IEK+F SRDENFNKVK F 
Sbjct: 241 GIGIAMGNAETWKDGAHFTTDSNl^GISKAIAHYGLIHFDIEKSFKSRDENFNKVKDF 300 

40 

Query: 301 HLLmGETIETPRLYDSKFAGFRSDFKVEEIVEFLYAASQGNQKVFDQSIRNLHLAIDKA 360 

H LMD +TIETPR Y EAG+RS FKVEEIVEFLYAAS+G+Q+ F Q+I +LH A+D+A 
Sbjct: 301 HRLMDSDTIETPRSYTISEAGYRSGFKVEEIVEFLYAASKGDQQQFTQAIFDLHGAVDQA 360 

45 Query: 361 RDKVISKDHPETPLVGETOALTDLLYLTYGSFVLKGVDPKPLFDIVHEANMGKIFPDGKA 420 

+KV +K H ETPL+G+VDAL DLLY TYGSFVLKGVDP+P+F+ VHEANM KIFPDGKA 
Sbjct: 361 ANKVQAKKHVETPLIGQTOALAE'LLYFTYGSFVLMGVDPQPIFFAVHEflNMAKIFPDGKA 420 

Query: 421 HFDPVTHKILKPDDVIEEHFAPEPSIRRELDSQIQKSLNR 459 
50 HFDPVTHKI KPD W+E APE +I++ELD Q+QKSL R 

Sbjct: 421 HFDPVTHKIQKPDYWQERHAPEVAIKKELDKQLQKSLQR 459 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

55 Example 1176 

A DNA sequence (GBSxl252) was identified in S.agalactiae <SEQ ID 3655> which encodes die amino 
acid sequence <SEQ ID 3656>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have no N-terminal signal sequence 

60 
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Final Results 

bacterial cytoplasm — Certainty=0 .1671 (Affirmative) < suco 
bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10101> which encodes amino acid sequence <SEQ ID 
101 02> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06903 GB:AP001518 unknown conserved protein [Bacillus halodurans] 
Identities = 61/141 (43%) , Positives = 92/141 (64%) 

Query: 22 YERILVAIDGSTESEIAFEKAVNVALRNDSELILTHVIDTRALQSFATFDTYIYEKLEKE 81 

Y I LVA+DGST+ + + A KA N A ++L + HVID+R+ + +D + E + 
Sbjct: 2 YNHILVAVDGSTQAKRALYKAFNYAKEFKADLFICHVIDSRSFATVEQTO 61 

Query: 82 AKDVLEEYEKQAREKGADKVRQVIEFGNPKTLLAHBIPEKEKVDLINIVGATGLNTFERFX 141 

K +L+ Y ++A + G DKV +++FG+PK ++ I +K +DLI+ GATGIjN ERF 
Sbjct: 62 GKKXLQRYSEEAEKAGVDKVHTILDFGSPKANISKTIAQKYDIDLIITGATGIJIAVERFL 121 

Query: 142 IGSSSEYILRHAKVDLLIVRD 162 

+GS SE + RHAK D+LIVR+ 
Sbjct: 122 MGSVSESVARHAKCDVLIVRN 142 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3657> which encodes the amino acid 
sequence <SEQ ID 3658>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1296 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities » 117/156 (75%) , Positives = 135/156 (86%) 

Query: 12 LEEDRLMSQKYERILVAIDGSTESELAFEKAVNvADRNDSELILTHVIDTRALQSFATFD 71 

L+ED MS KY+RILVAIDGS ESELAF K VNVALRND+ L+L HVIDTRALQS ATFD 
Sbjct: 25 LKEDSSMSLKYKEILVAIDGSYESEIAFNKGVNVALRNDATLLLVHVIDTRALQSVATFD 84 

Query: 72 TYIYEKLEKEAKDVLEEYEKQ7AREKGADKVRQVIEFGNPKTLLAHDIPEKEKVDIjIMVGA 131 

TYIYEKLE+EAKDVL+++EKQA+ G ++Q+IEFGNPK LLAHDIP++E DLIMVGA 
Sbjct: 85 TYI YEKLEQEAKDVLDDFEKQAQIAGITNI KQI IEFGNPKNLLAHDI PDRENADLIMVGA 144 

Query: 132 TGLNTFERFXIGSSSEYILRHAKVDLLIVRDPNKTM 167 

TGIjNTFER IGSSSEYI+RHAK+DLL+WD KT+ 
Sbjct: 145 TGIaNTFERLLIGSSSEYIMRHAKIDLLWRDSTKTL 180 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1177 

A DNA sequence (GBSxl253) was identified in S.agalactiae <SEQ ID 3659> which encodes the amino 
acid sequence <SEQ ID 3660>. This protein is predicted to be aspartate aminotransferase (aspC). Analysis 
of this protein sequence reveals the following: 

3 N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 2803 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



MKIFDKSMKLEHVAYDIRGPVLEEADRMRANGEKILRLNTGNPAAFGFEAPDEVIRDLIT 60 
M++F KS KLEHV YDIRGPV +EA R+ G KIL+LN GNPA FGFEAPDE++ D++ 
MRLFPKSDKLEHVCYDIRGPVHKEALRLESEGNKILKLNIGNPAPFGFEAPDEILvDVLR 6 0 

NARESEGYSDSKGIFSARKAVMQYYQLQNI -HVDMDDIYIVNGVSEGISMSMQALLDNDD 119 
N ++GY DSKG++SARKA++QYYQ + I ++D+YI NGVSE I+M+MQALL++ D 
NLPSAQGYCDSKGLYSARKAIVQYYQSKGILGATVMDVYIGNGVSELITMAMQALIiNDGD 120 

EVlVPMPDYPLWTACVSIAGGNA\TOICDSEANOTPDIDDIKSKITSKTKAIVLIIvrPNNP 179 
EVLVPMPDYPLWTA V+L+GG AVHY+CDE+ANW+P IDDIK+K+ +KTKAIV+ INPNNP 
EVLVPMPDYPLWTAAVTLSGGKAVHYLCDEDANWFPTIDDIKAKVNAKTKAIVIINPNNP ISO 

TGAVYPRE ILQE I VD IARQNDLI I FSDE VYDR 211 
TGAVY +E+LQEIV+IARQN+LIIF+DE+YD+ 
TGAVYSKELLQEI VEIARQNNLIIFADEIYDK 212 

A related DNA sequence was identified in S. pyogenes <SEQ ID 366 1> which encodes the amino acid 
sequence <SEQ ID 3662>. Analysis of this protein sequence reveals the following: 
Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

— '-- Final Results 

bacterial cytoplasm Certainty=0. 2936 (Affirmative) < suco 

bacterial membrane --- Certainty-0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 170/212 (80%) , Positives - 193/212 (90%) , Gaps = 1/212 (0%) 

Query: 1 MKIFDKSMKLEHVAYDIRGPvLEEADRMRANGEKILRLNTGNPAAFGFEAPDEVIRDLIT 60 

MKI +KS KLEHVAYDIRGPVL+EA+RM A+GEKILRLNTGNPAAFGFEAPDEVIRDLI 
Sbjct: 13 MKIIEKSSKLEHVAYDIRGPVLDEANRMIASGEKILRLNTGNPAAFGFEAPDEVIRDLIV 72 





1 


Sbjct: 






61 


Sbjct: 


61 


Query: 


120 


Sbjct: 


121 




180 


Sbjct: 


181 



Query: 120 EVLVPMPDYPLVWACVSLAGGNAVHYICDEEANWYPDIDDIKSKITSKTKAIVLINPNNP 179 

EVLVPMPDYPLWTACVSL GG AVHY+CDEEA WYPDI DIKSKITS+TKAIV+ INPNNP 
Sbjct: 133 EVIVPMPDYPLOTAOTSLGGGKAVHYLCDEEAGWYPDIADIKSKITSRTKAIWINPNNP 192 

Query: 180 TGAVYPRE ILQE I VD IARQNDLI I FSDE VYDR 211 

TGA+YP+EIL++IV +AR++ LI1F+DE+YDR 
Sbjct: 193 TGALYPKE ILEDI VALAREHQLI I FADEI YDR 224 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1178 

A DNA sequence (GBSxl254) was identified in S.agalactiae <SEQ ID 3663> which encodes the amino 
acid sequence <SEQ ID 3664>. Analysis of this protein sequence reveals the following: 
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Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-14.75 Transmembrane 38 - 54 ( 29 - 60) 

5 Final Results 

bacterial membrane Certainty=0. 6901 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 A related GBS nucleic acid sequence <SEQ ID 9389> which encodes amino acid sequence <SEQ ID 9390> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3665> which encodes the amino acid 
sequence <SEQ ID 3666>. Analysis of this protein sequence reveals the following: 

15 Possible site: 43 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-15.97 Transmembrane 35 - 51 ( 25 - 58) 

Final Results 

20 bacterial membrane Certainty=0 . 7389 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Hie protein has no significant homology with any sequences in the GENPEPT database. 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 51/87 (58%), Positives = 63/87 (71%) , Gaps = 7/87 (8%) 

MAK+PWE+K+V++ + ' TR SR STPW+TA LS FFVI+VAILFI FYTSN G 
30 Sbjct: 1 MAKEPWEEKIVDDTIGTR TRKSRNAFISTPWLTALLSVFFVI IVAILFIFFYTSNSG 57 

Query: 61 EDRAKETSGFYGASSQKVNSSKTKKAS 87 

4-R ET+GFYGAS+ K KT+KAS 
Sbjct: 58 SNRQAETNGFYGASTHK KTRKAS 80 

35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1179 

A DNA sequence (GBSxl255) was identified in S.agalactiae <SEQ ID 3667> which encodes the amino 
40 acid sequence <SEQ ID 3668>. Analysis of this protein sequence reveals the following: 
Possible site: 22 

»> Seems to have no N-terminal signal sequence 

Final Results — 

45 bacterial cytoplasm Certainty=0. 0815 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 3669> which encodes the amino acid 
sequence <SEQ ID 3670>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 0107 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow. 

Identities = 43/64 (67%) , Positives = 53/64 (82%) 

Query: 1 MKVALI PEKCIACGLCQTYSNI FDYQDDGIVKFSDTDNLEKEI PSSDQDTVLAVKSCPTK 60 

MKV++IPEKCIACGLCQTYS++FDY D+GIV FS + +1 SD+D +LAVKSCPTK 
Sbjct: 1 MKVSIIPEKCIACGLCQTYSSLFDYHDNGIVTFSSSSETSQSICPSDKDAILAVKSCPTK 60 

Query: 61 ALU 64 
ALT+ 

Sbjct: 61 ALTL 64 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1180 

A DNA sequence (GBSxl256) was identified in S.agalactiae <SEQ ID 3671> which encodes the amino 
acid sequence <SEQ ID 3672>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.61 Transmembrane 47 - 63 ( 41 - 69) 

Final Results 

bacterial membrane — Certainty=0. 5246 (Affirmative) < suco 
bacterial outside — Certainty=0 .0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC36851 GB:L23802 pore-forming peptide [Enterococcus faecalis] 
Identities = 42/130 (32%), Positives = 63/130 (48%), Gaps = 9/130 (S%) 

Query: 7 KIRYHWQPELSWAIIYWSIAIAPIFIGLSLLYERTE IPSQVFVLFAIFIVLVGIGFH 63 

K +++WQPEL+ IIYWS +FILLE I+VVF+FL G 

Sbjct: 3 KQKFYWQPELASTIIYWSCTFCILFISLILALENNGPYLISNLVMVPFFVFAYL---GIA 59 

Query: 64 RYFVIEEDGYLRIVSFNFLRRTKFPIEDIAKIEVTKSSVTIKFNNNHE- -RIFYMRKWPK 121 

R F + E L + + R+ P+ I K+ + S+ I + E ++F M+K 
Sbjct: 60 RSFM«ETS-LIWDVLWFRKKALPLSQIE:OTY1TOKSIEIFSSEFKEGSKVFLMKKKTD 118 

Query: 122 KYFLDALAIE 131 

FL+AL 1+ 
Sbjct: 119 SLFLEALKIK 128 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3673> which encodes the amino acid 
sequence <SEQ ID 3674>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.87 Transmembrane 47 - 63 ( 41 - 69) 
INTEGRAL Likelihood = -3.35 Transmembrane 20 - 36 ( 18 - 37) 

Final Results 

bacterial membrane Certainty=0 .4949 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAC36851 GB:L23802 pore-forming peptide [Enterococcus faecalis] 
Identities = 42/130 (32%), Positives = 70/130 (53%), Gaps = 12/130 (9%) 



Sbjct 
Query: 
Sbjct: 
Query: 
Sbjct: 



7 KIRYHWQPELSWSIIYWSIAFAPIFVGLSLLYERTE IPSRVFILFAIFAVLVGIGLH 63 

K +++WQPEL+ +IIYWS F +F+ LLE I + V+ F +FA L G+ 
3 KQKFYWQPEIASTIIYWSCTFCILFISLILALENNGPYLISNLVMVPFFVFAYL GIA 59 

64 RYF - 1 IENNGI LRI VS FKLFGPRKL1J ST ITKI EVTKSTLCL HVEDKSYLFYMRKWP 119 

R F + E +■ I+R V + F + L +S I K+ + ++ + ++ S +F M+K 

60 RSFNMTETSLIVRDVLW- -FRKKALPL£QIEKVTY?IEKSIEIFSSEFKEGSKVFLMKKKT 117 

120 KKYFLDALAV 129 

FL+AL + 
118 DSLFLEALKI 127 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 115/162 (70%), Positives = 132/162 (80%), Gaps = 1/162" (0%) 

Query: 1 MIKLFGKIRYHWQPELSWAIIYWSIAIAPIFIGLSLLYERTEIPSQVFVLFAIFIVIiVGI 60 

MIKLFGKIRYHWQPELSW+IIYWSIA APIF+GLSLLYERTEIPS+VF+LFAIF VBVGI 
Sbjct: 1 MI KLFGKI RYHWQPELSWS 1 1 YWS IAFAP I FVGliSIjLYERTEI PSRVF I LFAI FAVLVGI 60 

Query: 61 GFHRYFVIEEDGYLRIVSFNFLRRTKFPIEDIAKIEVTKSSVTIKFNNNHERIFYMRKWP 120 

G HRYF+IE +G LRIVSF K I I KIEVTKS++ + + +FYMRKWP 

Sbjct: 61 GLHRYFIIENNGILRIVSFKLFGPRKLLISTITKIEVTKSTLCLHVEDK-SYLFYMRKWP 119 

Query: 121 KKyFIiE5?iIiAIEPTFKGEVEIiLDNLIKMDYFECYRYDKKALTK 162 

KKYFLDALA+ P F+GEV L DN IK+DYFE Y++DKKALT+ 
Sbjct: 120 KKYFLDATAVNPYFQGEVILSDNFIXLDYFEVYQHDKKALTR 161 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1181 

35 A DNA sequence (GBSxl257) was identified in S.agalactiae <SEQ ID 3675> which encodes the amino 
acid sequence <SEQ ID 3676>. This protein is predicted to be peptidase t (pepT). Analysis of this protein 
sequence reveals the following: 

Possible site: 49 

>» Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm --- Certainty=0 .2913 (Affirmative) < suco 
bacterial membrane --- Certainty= 0.0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA20627 GB:L27596 tripeptidase [Lactococcus lactis] 
Identities = 274/406 (67%) , Positives = 334/406 (81%) , Gaps = 4/406 (0%) 

50 Query: 1 MSYEKLLERFLTWKINTRSNPNSTQTPTTQSQVDFALTVLKPEMEAIGLKDVHYLPSNG 60 

M YEKLL RFL YVK+NTRS+ NST TP+TQ+ V+FA + +M+A+GLKDVHYL SNG 
Sbjct: 1 MKYEKLLPRFIiEYVKVNTRSDENSTTTPSTQALVEFAHK-MGEDMKALGLKDVHYLESNG 59 

Query: 61 YLVGTLPATSDRLRHKIGFISHMDTADFNAENITPQIVDYKGGD- - IELGDSGYILSPKD 118 
55 Y++GT+PA +D+ KIG ++H+DTADFNAE + PQI++ G+ I+LGD+ + L PKD 

Sbjct: 60 WIGTIPANTDKKVIUCIGLLAHLDTADFMAEGVNPQILENYDGESVIQLGDTEFTLDPKD 119 

Query: 119 FPNIiNNYHGQTLITTDGKTLLGADDKSGIAEIMTAMEYLAS-HPEIEHCEIRVGFGPDEE 177 
FPNL NY GQTL+ TDG TLLG+DDKSG+AEIMT +YL + +P+ EH EIRVGFGPDEE 
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Sbjct: 


120 


Query: 


178 


Sbjct: 


180 


Query: 


238 






Query: 


298 


Sbjct: 


300 


Query: 


358 


Sbjct: 


360 



-1324- 

IiVHTDGTTLLGSDDKSGVAEIMTLADYLtNINPDFEHGEIRVGFGPDEE 1 

ODFDVDFAYTVDGGPLGELQYETFSAAGLELTFEGRNVHPGTAKNQMINA 2 
DFDVDFAYTVDGGPLGELQYETFSAAG + F+G+NVHPGTAKN M+NA 
IGVGADKFDVADFDVDFAYTVDGGPLGELQYETFSAAGAVIEFQGKNVHPGTAKlfflMVNA Z 

LQLAMDFHSQLPENERPEQTDGYQGFYHLYDLSGTVDQAKSSYI IRDFEEVDFLKRKHIA 2 
LQIA+D+H+ LPE +RPE+T+G +GF+HL L GT ++A++ YIIRD EE F +RK L 
IQIAIDYHNALPEFDRPEKTEGREGFFHLLKL3GTPEFARAQYIIRDHEEGKFNERKALM 7 

QDIADNMNEALQSERVKVKLYDQYYNMKKVIEKDMTPIN^AKEVMEELDIKPIIEPIRGG 3 
Q-t-IAD MN L RVK + DQYYNM ++IEKDM+ I+IAK+ ME LDI PIIEPIRGG 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3677> which encodes the amino acid 
sequence <SEQ ID 3678>. Analysis of this protein sequence reveals the following: 



Possible site: 41 

»> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2938 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 305/406 (75%), Positives = 352/406 (86%), Gaps = 1/406 (0%) 

Query: 1 MSYEKLLERFLTYVKINTRSWPNSTQTPTTQSQVDFAL'rVLKPEMEAIGLKDVHYLPSNG 60 

M Y+ LI1+RF+ YVK+NTRS P+S TP+T+SQ FALT+LKPEMEAIGL+DVHY P NG 
Sbjct: 5 MKYDNLLDRFIKOTKVNTRSVPDSETTPSTESQEAFALTILKPEMEAIGLQDVHYNPVKG 64 

Query: 61 YLVGTLPATSDRLRHKIGFISHMDTADFNAENITPQIVD-YKGGDIELGDSGYILSPKDF 119 

YL+GTLPA + L KIGFI+HMDTADFNAEN+ PQI+D Y+GGDI LG S Y L PK F 
Sbjct: 65 YLIGTLPANNPTLTRKIGFIAHMDTADFNAENVNPQIIDNYQGGDITLGSSNYKLDPKAF 124 

Query: 120 PNLNNYHGQTLITTDGKTLLGADDKSGIAEIMTAMEYLASHPEIEHCEIRVGFGPDEEIG 179 

PNIiNNY GQTLITTDG TLLGADDKSGIAEIMTA+E+L S P+IEHC+I+V FGPDEEIG 
Sbjct: 125 PNLNKYIGQTL1TTDGTTLLGADDKSGIAEIMTAIEFLTSQPQIEHCDIKVAFGPDEEIG 184 

Query: 180 IGADKFDVKDFDVDFAYTVDGGPLGELQYETFSA^GLELTFEGRNVHPGTAKNQMINALQ 239 

+GADKF+V DF+VDFAYT+DGGPLGELQYETFSAA LE+TF GRNVHPGTAK+QMINAL+ 
Sbjct: 185 VGADKFEVADFEVDFAYTMDGGPLGELQYETFSAAALEVTFLGRNVHPGTAI<DQMINALE 244 

Query: 240 IAMDFHSQLPEMRPEQTDGYCGFYHLYDLSGTVDQAKSSYIIRDFEEVDFLKR.KHLAQD 299 

LA+DFH +LP +RPE TDGYQGFYHL L+GTV++A++SYIIRDFEE F RK ++ 
Sbjct: 245 LAIDFHEKLPAKDRPEYTDGYQGFYELTGLTGTVEEARASYIIRDFEEASFEARKVKVEN 304 

Query: 300 IADWMNEALQSERVKTOiYDQYYNMKKVIEKDMTPINIAI^EVMEELDIKPIIEPIRGGTD 359 

IA +MN L ++RV V+I> DQYYNMKKVIEKDMT I +AKEVMEEL IKP+IEPIRGGTD 
Sbjct: 305 IAQSMNAQLGTKRVIVEIjNDQYYIMKKVIEKD^ 364 

Query: 3 60 GSKISFMGIPTPNLFAGGENMHC-RFEF^/SLQTMEKAVDVILGIVAK 405 

GSKISFMGIPTPN+FAGGENMHGRFEFVSLQTME+AVDVI+G+V K 
Sbjct: 365 GSKISFMGI PTPNIFAGGENMHGRFEFVSIjQTMERAVDVI IGLVCK 410 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
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INTEGRAL 



Example 1182 

A DNA sequence (GBSxl258) was identified in S.agalactiae <SEQ ID 3679> which encodes the a 
acid sequence <SEQ ID 3680>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N-terminal signal sequence 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood - 



INTEGRAL 



INTEGRAL 



INTEGRAL 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



215 - 231 



351 - 373 



211 - 233 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



•- Certainty= 0.5 9 04 (Affirmative) < succ: 
-- Certainty=0 . 0000 (Not Clear) < suco 
■- Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 8747> which encodes amino acid sequence <SEQ ID 8748> 
was also identified. Analysis of this protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: -10.58 
GvH: Signal Score (-7.5): -1.1 

Possible site: 32 
>» Seems to have no N-terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



modified ALOM 



Likelihood =-12 
Likelihood = -9 
Likelihood = -7 
Likelihood = -7 
Likelihood = -7 
Likelihood - -6 
Likelihood = -6 
Likelihood = -6 
Likelihood = -5 
Likelihood = -3 
Likelihood = -2 
Likelihood = -2 
Likelihood = -1 
Likelihood = 1 
score: 2.95 



26 



54 Transmembrane 343 



k Reasoning Step: 3 



threshold : 
Transmembrane 
Transmembrane 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



21 



- 486 ( 466 - 

- 515 ( 495 - 

- 321 ( 299 - 

- 359 ( 340 - 

- 390 ( 372 - 

- 220 ( 200 - 

- 76 ( 58 - 

- 115 ( 95 - 

- 451 ( 432 - 

- 423 ( 407 - 

- 268 ( 252 - 

- 147 ( 130 - 

- 189 ( 173 - 



■ Final Results 

bacterial t 
bacterial outside • 



-- Certainty=0. 5904 (Affirmative) ■ 
Certainty=0 . 0000 (Not Clear) . 



bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00276 GB:AF008220 YtgP [Bacillus subtilis] 
Identities = 178/545 (32%) , Positives = 302/545 (54%) , Gaps - 26/545 (4%) 

Query: 24 QMVKGTAWLTAGNFISRLIjGAiyilPWYAWMGKHAAEANALFGMGVEiyALFLLISTVGI 83 

++++GT LT G +ISR+LG +Y+IP+ +G A ALF GY Y LFL I+T+G 
Sbjct: 4 KLLRGTFVLTLGTYISRILGMVYLIPFSIMVG ATGGALFQYGYNQYTLFLNIATMGF 60 



WO 02/34771 



PCT/GB01/04789 



Query: 84 PVAVAKQVSKYNTLGKEEMSIYLVRKILQFMLIL3'3IFALI^IGSPLFASLSKGGQE-- 141 

P AV+K VSKYN+ G E S +++ + ML+ G I I+Y+ +P+FA +S GG++ 
Sbjct: 61 PAAVSKFVSKYNSKGDYETSRKMLKAGMSVMU/TGMIAFFILYLSAPMFAEISLGGKDNN 120 



Query: 142 LVPILRSLTLAVLVFPSMSVLRGFFQGFNNLKPYAISQyAEQIIRVIWMLLTAF 195 

+V ++R ++LA+LV P MS++RGFFQG + P A+SQV EQI+R+I++L F 
Sbjct: 121 GLTIDHWYVIRMVSLALLWPIMSLVRGFFQGHQMMGPTAVSQWEQIVRIIFLLSATF 180 

Query: 196 YIMRLGSGDYIAAVTQSTFAAFVGMFASIAVLLYFLW--RYNMLSALIGKTPKHIKLDTK 253 

+G + AV +TFAA +G F + V+LY W R L A++ T L K 

Sbjct: 181 LILKVFNGGLVIAVGYATFAALIGAFGGL-VVLYIYWNKRKGSLLAMMPNTGPTANLSYK 239 

Query: 254 EILIETIKEAIPFIITGAAIQIFKLIDQFSFGNTM- -ALFTNYSSEELRVMFAYFSSNPG 311 

++ E A P++ G AI ++ ID +F MA S + L ++ Y 
Sbjct: 240 KMFFELFSYAAPYVFVGLAIPLYNYIDTNTFNKAMIEAGKQAISQDMLAILTLYVQ 295 

Query: 312 KVTMILIAVATAIAGVGIPLLTENFVKNDKKAAARLVVNNLQMLLMFLLPAVAGSVILAK 371 

K+ MI +++ATA IP +TE+F + K + + +Q +L ++PAV G +L+ 

Sbjct: 296 KLVMIPVS1ATAFGLTLIPTITESFTSGNYKLLNQQINQTMQTILFLIIPAWGISLLSG 355 

Query: 372 PLYTVFYGL PQGQALGLFVISLIQT~ILSIYTVLAPMLQALFENRKAIIYFLYGLV 427 

P YT FYG P+ A L S + 1+ S++TV A +LQ + + + A++ + G+V 

Sbjct: 356 PTYTFFYGSESLHPELGANILLWYSPV-AILFSLFTVNAAILQGINKQKFAWSLVIGVV 414 

Query: 428 AKVILQLPSIFLFHAYGPLFSTWALCIPVILMYLKIHEITGFKRQAIRRTSALVLILTL 487 

K++L +P I L A G + +T + ++ ++ I G+ + + + + L+L+L+ 

Sbjct: 415 IKLVLWPLIKLMQADGAIIATALGYIASLLYGFIMIKRHAGYSYKILVKRTVLMLVLSA 474 

488 LMSFIISMIIWLMNLVI-VPDSRLVSLVYIIVIGAIGLGVYGFMMATHLLDKMIGSRAQ 546 

+M + ++ W++ I D ++ + + +++ A+G VY + L K++G R 

475 IMGIAVKIVQWVLGFFISYQDGQMQAAIVWIAAAVGGAVYLYCGYRLGFLQKILGRRLP 534 



Sbjct 



547 DLRRK 551 
RK 

535 GFFRK 53 9 



A related DNA sequence was identified in S.pyogenes <SEQ ID 368 1> which encodes the amino acid 
sequence <SEQ ID 3682>. Analysis of this protein sequence reveals the following: 



Possible site: 49 
Seems to have no N-i 

Likelihood 
Likelihood 
Likelihood 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



irminal signal sequence 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 484 { 466 • 

- 321 ( 299 • 

- 359 ( 340 ■ 

- 390 ( 373 • 

- 154 ( 137 • 

- 116 ( 98 • 

- 431 ( 410 • 



499 - 515 



499 



173 - 189 i 



493) 
323) 
362) 
398) 
157) 
122) 
432) 
519) 
451) 
190) 
220) 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 4439 (Affirmative) < succ; 

• Certainty=0. 0000 (Not Clear) < suco 
■ Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 



i4%) , Gaps = 24/536 (4%) 

Query: 14 MVQGAAWSTAGNFISRLLGVLYIIPWYIIMGQYAIQANALFNMGYNWAYFLLISTTGIM 73 

+4+G T G +ISR+LG++Y+IP+ I +G ALF GYN Y FL I+T G 

Sbjct: 5 LLRGTFVLTLGTYISRILGMVYLIPFSIMVGA TGGALFQYGYNQYTLFLNIATMGFP 61 
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Query: 


74 


Sbjct: 


62 


Query: 


131 


Sbjct: 


122 


Query: 


186 


Sbjct: 


182 


Query: 


244 


Sbjct: 


241 


Query: 


302 


Sbjct: 


297 


Query: 


362 


Sbjct: 


357 


Query: 




Sbjct: 


417 




479 


Sbjct: 


477 



i V+KYNS G B S 



I+YL +P+FA +S GG D 



AV +TFAA IG 



- P++ G AI 



++PA+ G +L+ I 



h + + 1 +G VY 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 320/541 (59%) , Positives = 431/541 (79%) 



MSQKTTKVSQQEQMVKGTAWLTAGNFISRLLGAIYIIPWYAWMGKHAAEANALFGIV1GYEI 71 
MS + +++Q+E MV+G AW TAGNFISRLLG +YIIPWY WMG++A +ANALF MGY + 
MSTEKKQLTQEELWQGAAWSTAGNFISRLLGVLYIIPWYIWMGQYAIQANALFW1GYNV 60 



YA FLLIST G+ VA+AKQV+KYN++G+ E S L+R L+ ML LG IF+ IMY+GSPL 



FASIiS G LVPI+ SL+LAV +FP MSV+RG FQ3 KN+KPYA+SQ+AEQ+IRVIWML 



LT F+IM+LGSGDY +AVTQSTFA&F+GM AS+ VL Y+LW+ +L+A+ K 



K +L+ET+KE+IPFI+TG+AIQ F+LIDQ++F NTM LFT+YS +L V+F YF++NP 



K+TM+LIAVA +1 GVGI LLTEN+VK D KAAARL+4NN++ML+MFLLPA+ G++ILA+ 





12 


Sbjct: 


1 




72 


Sbjct: 


61 




132 


Sbjct: 


121 




192 


Sbjct: 


181 




252 


Sbjct: 


241 




312 


Sbjct: 


301 




372 


Sbjct: 


361 




432 


Sbjct: 


421 



h +A+ LFV L QT++L++YT+ +PMLQALFENRKAI YF YG++ K+4- 



LQ+P I+L HAYGPL +TT+AL +P+ LMY +++++T F R+ +++ L LI TLLM 
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Query: 492 IISMIIWLMNLVIVPDSRLVSLVYIIVIGA;GLe->7YGFMALATHLLDKMIGSRAQDLRRKL 552 

++ + WL+ P RL SL+Y+++IG +G+ VY + L TH LDK+IGS+A LR+KL 

Sbjct: 481 WFVMJWLLGyAFKPTGRLTSLLYLLIIGGLGMTVYTALTLLTHQLDKLIGSKASRLRQKL 541 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1183 

A DNA sequence (GBSxl259) was identified in S.agalactiae <SEQ ID 3683> which encodes the amino 
acid sequence <SEQ ID 3684>. Analysis of this protein sequence reveals the following: 

) N-terminal signal sequence 

-- Final Results 

bacterial cytoplasm --- Certainty=0 .4104 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0 0 0 0 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06290 GB:AP001515 uBP-N-acetylmuramoylalanyl-D-glutamyl-2 , • 
6-diaminopimelate ligase [Bacillus halodurans] 
Identities = 153/468 (32%) , Positives = 237/468 (49%) , Gaps = 23/458 (4%) 

Query: 33 NVTFNALSYDSRQISSDTLFFA-KGATFK-KHYLDSAITAGLSFYVSETDYGADIPVILV 90 

N +++ DSR++ LFF KG T +Y A++ G VSE +PV++V 
Sbjct: 21 NPDIHSIHOTSREv^GGLFFCIKGYTVDGHDYACjQAVSNGAVAWSERPLELSVPVVVV 80 

Query: 91 ND I KKAMSLI SMSFYNNPQNKLKLIAFTGTKGKTTAAYFAYHMLKVNHR - PAMLSTMNTT 149 

D ++AM+ ++ FY P N L+L+ TGT GKTT + +++ + ++ TM T 
Sbjct: 81 RDSRRAMAQVATKFYGEPTNDLQLIGVTGTNGKTTITHL1EKIMQDQGKMTGLIGTMYTK 140 

Query: 150 LDGKSFFKSHLTTPESLDLFRMMATAVENQMTHLIMEVSSQAYLTKRVYGLTFDVGVFLN 209 

+ G ++ TTPESIi Ii R A ++ +T +MEVSS A + RV G FDV VF N 
Sbjct: 141 I-GHELKETKOTTPESLVTjQRTFADMKKSGVTTAMMEVSSHALQSGRWGCDFDVAVFSN 199 

Query: 210 ISPDHIGPIEHPTFEDYFFHKRLLME NSNAVWN SQMDHFNIVKEQVEYI 259 

+4PDH+ H T E Y F K LL V+N + D + QV 

Sbjct: 200 LTPDHLD--YHGTMERYKFAKGLLFAQLGMYQGKVAvXjNADDPASADFAEMTIAQVVTY 257 

Query: 260 PHDFYGDY-SENVITESKAFSFHVKGKLEN-TYDIKLIGKFNQENAIAAGLACLRLGVSI 317 

+ D+ +ENV S +F + E I LIGKF+ N +AA A GV + 

Sbjct: 258 GIENEADFQiffiNTOITSTGTTFEIjAAFEER^LSIHLIGKFSVYNvXAAAAAAYVSGVPL 317 

Query: 318 EDIKNGIAQTT-VPGRMEvTjTQTNGAKIFVDYAHNGDSLKKLLAVVEEHQKGDIII.VLGA 376 

++IK + + V GR E + + VDYAH DSL+ +L V E KGD+ +V+G 

Sbjct: 318 QEIKKSLEEVKGVAGRFETWHDQPFTVIVDYAHTPDSLENVLKTVGELAKGDVRVVVGC 377 

Query: 377 PGNKGQSRRKDFGDVINQHPNLQVILTADDPNFEDPLVISQEIASHINRPVTIII-DREE 435 

G++ +++R ++ N Q I T+D+P E+P+ I +++ ++I DR+E 

Sbjct: 378 GGDRDKTKRPVMAEIATTFAN-QAIFTSDNPRSEEPMDILRDMEQGAKGDSYLMIEDRKE 436 

Query: 436 AIANASTLTNCKLDAI 1 1 AGKGADAYQ 1 1 KGNRDNYSGDLEVAKKYLK 483 

AI A L + D I+IAGKG + YQ + ++ D VA++ +K 
Sbjct: 437 AI FKAI ELAK- EDD 1 1 VIAGKGHETYQQFRDRTIDFD - DRI VAQQAI K 482 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3685> which encodes the amino acid 
sequence <SEQ ID 3686>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 
■ Final Results 
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bacterial cytoplasm Certainty=0. 4717 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty= 0 . 0 0 0 0 (Not Clear) < suco 



5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 350/432 (72%) , Positives = 399/482 (82%) , Gaps = 1/482 (0%) 



Query: 1 MITIDKILEILKNDHNFREILFHEHYYYNWTQNVTFNALSYDSRQISSDTLFFAKGATFK 6 0 

MITI+++L+ILK DHNFRE+L + Y+Y++ Q +F LSYDSRQ+ TLFFAKGATFK 
Sbjct: 1 MITIEQLLDILKKDHNFREVLDADGYHYHY-QGFSFERLSYDSRQVDC3KTLFFAK6ATFK 59 

Query: 61 KEYLDSAITAGLSFWSETDYGADIPVILVNDIKKAMSLISMSFYNNPQNKLKLIjAFTGT 120 

+YL AIT GL Y+SE DY IPV+LV DIKKAMSLI+M+FY NPQ KLKLLAFTGT 
Sbjct: 60 ADYLKEAITNGLQLYISEVDYEIiGIPVVLVTDIKKAMSLIAMAFYGNPQEKLKLIiAFTGT 119 

Query: 121 KGKTTAAYFAYHMLKVNHRPAMLSTMNTTLDGKSFFKSHLTTPESLDLFRMMATAVENQM 180 

KGKTTAAYFAYHMLK +++PAM STMNTTLDGK+FFKS LTTPESLDLF MMA V N M 
Sbjct: 120 KGKTTAAYFAYHMLKESYKPAMFSTIVnNrrTLDGKTFFKSQLTTPESLDLFAMMAECVTNGM 179 

Query: 181 THLIMEVSSQAYLTKRVYGLTFDVGVFLNISPDHIGPIEHPTFEDYFFHKRLLMENSNAV 240 

THLIMEVSSQAYL RVYGLTFDVGVFLNISPDHIGP1EHPTFEDYF+HKRLLMENS AV 
Sbjct: 180 THLIMEVSSQAYLVDRVYGLTFDVGVFLNISPDHIGPIEHPTFEDYFYHKRLLMENSRAV 239 

Query: 241 VVNSQNDHFNIVKEQvEYIPHDFYGDYSENOTTESKAFSFHVKGKLENTYDIKLIGKFNQ 300 

V+NS MDHF+ + +QV H FYG S+N IT S+AFSF KG+L YDI+LIG FNQ 
Sbjct: 240 VINSGMDHFSFLADQVADQEHVFYGPLSDNQI'TTSQAFSFEAKGQIAGHYDIQLIGHFNQ 299 

Query: 301 ENAI AAGLACLRLGVS I ED I KNGI AQTTVPGRMEVLTQTNGAKI FVDYAHNGDSLKKLLA 360 

ENA+AAGLACLRLG S+ DI + GIA+T VPGRMEVLT TN AK+FVDYAHNGDSL+KLL+ 
Sbjct: 3 00 ENAMAAGLACLRLGASLAD I QKG I AKTRVPGRMEVLTMTNHAICVFVDYAI-INGDSLEKLLS 359 

Query: 3 61 WEEHQKGDIILVLGAPGNKGQSRRKDFGDVINQEPNLQVILTADDPNFEDPLVISQEIA 420 

WEEHQ G ++L+LGAPGNKG+SRR DFG VI+QHPNL VILTADDPNFEDP IS+EIA 
Sbjct: 360 WEEHQTGKLMLILGAPGNKGESRRADFGRVIHQHPNLTVI^TADDPNFEDPEDISKEIA 419 

Query: 421 SHINRPVTIIIDREEAIANASTLTNCKLDAIIIAGKGADAYQIIKGNRDNYSGDLEVAKKYL 482 

SHI RPV II DRE+AI A +L DA+IIAGKGADAYQI+KG + Y+GDL +AK YL 

Sbjct: 420 SHIARPVEIISDREQAIQKAMSLCQGAKDAVIIAGKGADAYQIVKGQQVAYAGDLAIAKHYL 481 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1184 

A DNA sequence (GBSxl260) was identified in S.agalactiae <SEQ ID 3687> which encodes the amino 
acid sequence <SEQ ID 3688>. Analysis of this protein sequence reveals the following: 

T- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1421 (Affirmative) < suco 

50 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
55 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1185 

A DNA sequence (GBSxl261) was identified in S.agalactiae <SEQ ID 3689> which encodes the amino 
acid sequence <SEQ ID 3690>. This protein is predicted to be FhuA (fepC). Analysis of this protein 
sequence reveals the following: 

5 Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2785 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9975> which encodes amino acid sequence <SEQ ID 9976> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98153 GB:AF251216 FhuC [Staphylococcus aureus] 
Identities = 141/259 (54%) , Positives = 193/259 (74%) 





7 


MSHIKAENIIVSYDQKEIINNLSLSILNQKITTIIGANGCGKSTLLKALTRIHKIKDGTI 


66 






M+ + + + + Y UN h + I + K+T+IIG NGCGKSTLLKAL+R+ +K+G + 




Sbj ct: 


1 


MNRLHGQQVKIGYGDNTIINKLDVEIPDGKVTSIIGPNGCGKSTLLKALSRLIAVKEGEV 


60 




67 


TIDGHDIAHLPTKEIAKKIALLPQVLEATEGITVYELISYGRFPHQKYLGNLTNDDRSKI 


126 






+DG +1 TKE IAKKIA+LPQ E +G+TV EL+SYGRFPHQK G LT +D+ +1 




Sb j ct : 


61 


FLDGENIHTQSTKEIAKKIAILPQSPEVADGLTVGELVSYGRFPHQKGFGRLTAEDKKEI 


120 


Query: 


127 


HWAMEMTIWAQFANRDVDDLSGGQRQKVWIAMAIAQDTDTIFLDEPTTYLDMNHQLEvLE 


186 






WAME+T F +R ++DLSGGQRQ+VWIAMALAQ TD IFLDEPTTYLD+ HQLE+LE 




Sbjct: 


121 


DWAMEVTGTDTFRHRSINDLSGGQRQRWIAMALAQRTDIIFLDEPTTYLDICHQLEILE 


180 




187 


LLKKLNDETQKTIIMVLHDLNLSARYSDYLVAMKTGKIIYEGSPSQIMTKDIIKDIFKID 


246 






L++KLN E TI+MVLHD+N + R+SD+L+AMK G II GS ++T++I++ +F ID 




Sbjct: 


181 


LVQKLNQEQGCTIVMVLHDINQAIRFSDHLIAMKEGDIIATGSTEDVLTQEILEKVFNID 


240 




247 


AHI IQDPISKQPVLLSYQL 265 








+ +DP + +P+L++Y h 




Sbjct: 


241 


WIiSKDPKTGKPLLVTYDL 259 





A related DNA sequence was identified in S.pyogenes <SEQ ID 1929> which encodes the amino acid 
40 sequence <SEQ ID 1930>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 2970 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

50 Identities = 166/259 (64%) , Positives = 208/259 (80%) 

Query: 7 MSHIKAENIIVSYDQKEIINNLSLSILNQKITniGANGCGKSTLLKALTRIHKIKDGTI 66 

M+ I AE++ ++Y+Q+ 11+ LS I KITTIIGANGCGKS+LLKALTR+ K G + 
Sbjct: 1 MTTISAEDLTIAYEQRTIIDKLSFYIPEGKITTIIGANGCGKSSLLKALTRLLPPKQGW 60 

55 

Query: 67 TIDGHDIAHLPTKEIAKKIALLPQVLEATSGIT\7YELISYGRFPHQKYLGNLTNDDRSKI 126 

++G +IA L TKE+AKK+ALLPQV EAT GITVYEL+SYGRFPHQ Y GNL+ D+ I 
Sbjct: 61 YraGQNIATLETKEVAKIOALLPQVQEATNGITVTELVSYGRFPHQSYFGNLSPADKKAI 120 
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Query: 127 HWAMEMTWAQFAl^DVDDLSGGQEQKOTVIAMALAQDTDTIFLDEPTTYLDMNHQLEVLE 186 

HWAM+ TNV +A++ VD LSGGQRQ+VW+AMALAQ TDTIFLDEPTTYLD+NHQLE+LE 
Sbjct: 121 HWAMQATNVmYADQPVDALSGGQRQRWLAMALAQGTDTIFLDEPTTYLDLNHQLEILE 180 

5 Query: 187 LLKKIiNDETQKTI IMVLHDLNLSARYSDYLVAMKTGKI IYEGSPSQIMTKDI IKDI FKID 246 

L+K JM + KTI+MVLHDLNLSARYSD+IH-AMK GKI Y G+ + +MT II+DIF+I 
Sbjct: 181 LVKSLNKDAGKTIVMVLHDLNLSARYSDHLIAMKHGKIHYTGTIADVMTSPIIQDIFQIK 240 

Query: 247 AHIIQDPISKQPVLLSYQL 265 
10 ++ DPI P++L+YQL 

Sbjct: 241 PVLVDDPIHNCPIVLTYQL 259 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

15 Example 1186 

A DNA sequence (GBSxl262) was identified in S.agalactiae <SEQ ID 3691> which encodes the amino 

acid sequence <SEQ ID 3692>. This protein is predicted to be ferrichrome ABC transporter. Analysis of 

this protein sequence reveals the following: 

Possible site: 20 
20 >>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

25 bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07609 GB:AP001520 ferrichrome ABC transporter 

(ferrichrome-binding protein) [Bacillus halodurans] 
30 Identities = 94/301 (31%) , Positives = 177/301 (58%) , Gaps = 11/301 (3%) 



Query: 6 1 IVLTLLTFFLV SCGQQTKQESTKTTISK- -ME 

+++LT+L F L+ +CG T E S+ M E T ++P NP++V+ 

Sbjct: 7 LLLLTMLLFALLWAACGSNTDAEQADELESEDGMITYESETGPIEVPANPQRW--ALG 64 

Query: 61 YTGYLLKLGVNVSSYSLDLEKDSPVFGKQLKEAKKLTADDTEAIAAQKPDLIMVFDQDPN 120 

+TG +L L VNV K++P + + L++ +++ ++ E I PDLI+ + N 

Sbjct: 6S FTGNILALDVNWGVDT-WSKNNPNYEQLLQDVTEVSEENLEQIMELDPDLIIAYSTVQN 123 

Query: 121 INTLKKIAPTLVIKTYGAQNYLDMMPALGKVFGKEKE^OTQWVSQWKTKTLAVKKDLHHILK 180 

L++IAPT++ Y +YL+ +GK+ KE+EA WV +K + +++ + 

Sbjct: 124 AEQLQEIAPTVLYTYNNIZIYLEQ^/EIGKLIjNICEEEAQAWVDDFKAEAEQAGEEIKEKIG 183 



Query: 181 PNTTFTIMDFYDKNIYLYGNNFGRGGELIYDSLGYAAPEKVKKDVFKKGWFTVSQEAIGD 240 

+ T ++++ ++ +Y++GNN+GRG E++Y ++ A PE+V++ G++ +S EA+ + 

Sbjct: 184 EDAWSVIETFEDQLYVFGNNWGRGTEILYC/IWDLMIPERVEEMALADGYYALSFEALPE 243 

Query: 241 YVGDYALWINKTTKKAASSLKESDVWKNLPAVKKGHIIESNYDVFYFSDPLSLEAQLKSF 30" 

+ GDY +++ N +A +S +E++ ++++PAV+ G + E+N FYF+DPLSLE QL+ F 
Sbjct: 244 FAGDYIILSKN---DEADNSFQETNTYQSIPAVQNGQVFEANAKEFYFNDPLSIjELQLEFF 301 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3693> which encodes the amino acid 
sequence <SEQ ID 3694>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
»> May be a lipoprotein 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB07609 GB:AP001520 ferrichrome ABC transporter 

(ferrichrome-binding protein) [Bacillus halodurans] 
Identities = 112/306 (36%) , Positives = 178/306 (57%) , Gaps = 3/306 (0%) 

Query: 2 KKLTLLLTLCLTTITLIACGNQATNHSNTASKSLSPMPQIAGVTYYGDIPKQPKRWSLA 61 

KILL L + + ACG+ +S M T ++P P+RW+L 

Sbjct: 5 KHLLLLTMLLFALLWAACGSNTDAEQAD3LESEDGMITYESETGPIEVPANPQRWALG 64 

Query: 62 STYTGYLKKLDMNLVGVTSYDKKNPIIiAKTVKKAKQVAATDLEAVTTLKPDIilWGSTEE 121 

+TG + LD+N+VGV ++ K NP + ++ +V+ +LE + L PDLI+ ST + 
Sbjct: 65 --FTGNILALDVNWGVDTWSKNNPNYEQLLQDVTEVSEENLEQIMELDPDLIIAYSTVQ 122 

Query: 122 NIKQLAEIAPVIS IEYRKRDYLQVLSDFGRI FNKEDKAKKWLKDWKTKTAAYEKEVKAVT 181 

N +QL EIAP + Y DYL+ + G++ NKE++A+ W+ D+K + +E+K 
Sbjct: 123 NAEQLQEIAPTVLYTYNNIiDYLEQHVEIGKLLNKEEEAQAWVDDFKARAEQAGEEIKEKI 182 

Query: 182 GDKATFTIMGLYEKDVYLFGKDWGRGGEIIHQAFHYDAPEKVKTEVFKQGYLSLSQEVLP 241 

G+ AT +++ +E +Y+FG +WGRG EI++Q PE+V+ GY +LS E LP 

Sbjct: 183 GEDATOSVIETFEDQLWFGNNWGRGTEILYQTMDLAMPERVEEMALADGYYALSFEALP 242 

Query: 242 DYIGDYVWAAEDDKTGSALYESKLWQS I PAVKKHHVI KVNANVFYFTDPLSLEYQLETL 301 

++ GDY+++ +++D+ ++ E+ +QSIPAV+ V + NA FYF DPLSLE QLE 
Sbjct: 243 EFAGDYIIL-SKNDEADNSFQETNTYQSIPAVQNGQVFEANAKEFYFNDPLSLELQLEFF 301 

Query: 302 REAILS 307 

+E LS 
Sbjct: 302 KEHFLS 307 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 140/316 (44%) , Positives = 212/316 (66%) , Gaps = 12/316 (3%) 

Query: 1 MKKIGIIV-LTLLTFFLVSCGQQTKQESTKTT--ISKMPKIEGFTYYGKIPENPKKVINF 57 

MKK+ +++ L L T L++CG Q S + +S MP+I G TYYG IP+ PK+V++ 
Sbjct: 1 MKKLTLLLTLCLTTITLIACGNQATNHSNTfiSKSLSPMPQIAGVTYYGDIPKQPKRWSL 60 

Query: 58 TYSYTGYLLKLGYN- - -VSSYSLDLEKDSPVFGKQLKEAKKLTADDTEAIAAQKPDLIMV 114 

+YTGYL KL +N V+SY +K +P+ K +K+AK++ A D EA+ KPDLI+V 
Sbjct: 61 ASTYTGYLKKLDMNLVGVTSY DKKNPIIAKTVKKAKQVAATDLEAVTTLKPDLIW 116 

Query: 115 FDQDPNINTLKKIAPTLVIKYGAQNYLDMMPAtGKATFGKEKEANQVWSQWKTKTIAVKKD 174 

+ NI L +IAP + I+Y ++YL ++ G++F KE +A +W+ WKTKT A +K+ 
Sbjct: 117 GSTEENIKQIAEIAPVISIEYRIOJDYLQVLSDFGRIFNKEDKAKKWLKDWKTKTAAYEKE 176 

Query: 175 LHHILKPNTTFTIMDFYDKNIYLYGNNFGRGGEL1YDSLGYAAPEKVKKDVFKKGWFTVS 234 

+ + TFTIM Y+K++YL+G ++GRGGE+I+ + Y APEKVK +VFK+G+ ++S 

Sbjct: 177 VKA.VTGDKATFTIMGLYEE03VYLFGKDWGRGGEIIHQAFHYDAPEKVKTEVFKQGYLSLS 236 

Query: 235 QEAIGDYVGDYALVNINKTTKKAASSLKESDVWKNLPAVKKGHIIESNYDVFYFSDPLSL 294 

QE + DY+GDY +V K S+L ES +W+++PAVKK H+I+ N +VFYF+DPLSL 

Sbjct: 237 QEVLPDYIGDYVWAAE--DDKTGSALYESKLWQSIPAVKKHHVIKVNANVFYFTDPLSL 294 

Query: 295 EAQLKSFTKAI KENTN 310 

E QL++ +AI + N 
Sbjct: 295 EYQLETLREAILSSEN 310 



60 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1187 

A DNA sequence (GBSxl263) was identified in S.agalactiae <SEQ ID 3695> which encodes the amino 
acid sequence <SEQ ID 3696>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3431 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 1188 

A DNA sequence (GBSxl264) was identified in S.agalactiae <SEQ ID 3697> which encodes the amino 
acid sequence <SEQ ID 3698>. This protein is predicted to be ferrichrome transport permease (permease). 
Analysis of this protein sequence reveals the following: 



25 



30 



Possible site: 


















>>> May be a lipoprotein 
















INTEGRAL 


Likelihood = 


12 


74 


Transmembrane 


129 


145 


123 


150) 


INTEGRAL 


Likelihood = 


10 


67 


Transmembrane 


248 


264 


240 


283) 


INTEGRAL 


Likelihood = 


10 


14 


Transmembrane 


205 


221 


196 


228) 


INTEGRAL 


Likelihood = 


-5 


95 


Transmembrane 


319 


335 


317 


336) 


INTEGRAL 


Likelihood = 


-3 


56 


Transmembrane 


73 


89 


73 


90) 


INTEGRAL 


Likelihood = 


-3 


19 


Transmembrane 


288 


304 


288 


304) 




Likelihood = 


-2 


76 


Transmembrane 


266 


282 


265 


283) 


INTEGRAL 


Likelihood = 




23 


Transmembrane 


103 


119 


101 


122) 


INTEGRAL 


Likelihood = 


-1 


01 


Transmembrane 


158 


174 


158 


174) 


Final Results 

















bacterial membrane Certainty=0 . 6095 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98154 GB:AF251216 FhuB [Staphylococcus aureus] 
Identities = 116/313 (37%), Positives = 194/313 (61%), Gaps = 3/313 (0%) 

Query: 26 ILFLIGCYASLRFGAINFKTSDLITVLKNPLKNSNAQDVIFDIRLPRIIAAILVGAAMSQ 85 

++ LI + S G + S +1 + N ++ Q++I +IR+PR 1AA++VG A++ 
Sbjct: 28 MILLITLFISTLIGDAKIQASTIIEAIFNYNPSNQQQNIINEIRIPRNIAAVIVGMALAV 87 

Query: 86 AGAIMQGVTRNAIADPGLLGINAGAGLALWAyAFLGSMHYSTILIVCLLGSVISCLLVF 145 

+GAI+QGVTRN +ADP L+G+N+GA AL + YA L + + ++ LG+++ +V 
Sbjct: 88 SGAIIQGOTPJSGLADPALIGLNSGASFALALTYAVLPNTSFLILMFAGFLGAILGGAIVL 147 

Query: 146 TLSYTKQKGYHQLRLIIAGAMISTLFTSVGQVVTLYFKLNRTVIGWQAGGLSQINWKMLI 205 

+ +++ G++ +R+ILAGA +S + T++ Q + L F+LN+TV W AGG+S W L 
Sbjct: 148 MIGRSRRDGroPMRIILAGAAVSAMLTALSQGIAI^FRLNQTVTFWTAGGVSGTTWSHLK 207 

Query: 206 IIAPIIILGLLISQLLAHQLTILSLNESVAKALGQKTQLMTAFLLLIVLFLSASSVALIG 265 

P+I + L I ++ QLTIL+L ES+AK LGQ ++ L+I + L+ +VA+ G 
Sbjct: 208 WAIPLIGIALFIILTISKQLTILNLGESLAKGLGQNVTMIRGICLIIAMILAGIAVAIAG 267 
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Query: 266 WSFIGLIIPHFIKLFIPKDYRLLLPLJGFSGATFMIWVDLSSRIINPPSETSISSIISI 325 

V+F+GL++PH + I DY +LPL G ++ D+ +R + E + +IIS 
Sbjct: 263 QVAFVGLMVPHIARFLIGTDYAKILPLTALLGGILVIjVADVIARYL GEAPVGAIISF 324 

Query: 326 VGLPCFLWLIRKG 338 

+G+P FL+L++KG 
Sbjct: 325 IGVPYFLYLVKKG 337 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3699> which encodes the amino acid 
sequence <SEQ ID 3700>. Analysis of this protein sequence reveals the following: 

Possible £ 



3 have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.09 Transmembrane 256 - 272 

INTEGRAL Likelihood =-10.67 Transmembrane 26 - 42 

INTEGRAL Likelihood = -6.90 Transmembrane 137 - 153 

INTEGRAL Likelihood = -5.10 Transmembrane 167, - 183 

INTEGRAL Likelihood = -4.57 Transmembrane 213 - 229 

INTEGRAL Likelihood = -2.02 Transmembrane 112 - 128 



248 - 287! 

23 - 48) 

133 - 1571 

166 - 187! 

210 - 232! 

110 - 13i: 



Final Results 

bacterial membrane Certainty=0. 5437 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



Query: 34 LSFSLCVAIYCHIjRFGAVALSIIQDLNEILFG-KQNGHKANVLIjAIRLPRLFGATLTGSAL 92 

LS L + ++ G + + +F + + N++ IR+PR A + G AL 

Sbjct: 26 LSMILLITLFISTLIGDAKIQASTIIEAIFNYNPSNQQQNI INEIRIPRNIAAVIVGMAL 85 

Query: 93 AVSGTIMQAITRNPIAEPGLLGINAGAGLALVLAYAFVPHLHYSLIILLSLLGSSLAATL 152 

AVSG I+Q +TRN +A+P L+G+N4GA AL L YA +P+ + +++ LG+ L + 
Sbjct: 86 AVSGAIIQGVTRNGLADPALIGLNSGASFALALTYAVLPNTSFLILMFAGFLGAILGGAI 145 

Query: 153 VFGLSYQSGKGYHQLRLVLAGAMVSILLSALGQGITNYYHLANAVIGWQAGGLVGVNWQM 212 

V + G++ +R++LAGA VS +L+AL QGI + L V W AGG+ G W 

Sbjct: 146 VLMIGRSRRDGFNPmillAGAAVSAMLTALSQGIAI^FRI^QTVTFWTAGGVSGTTWSH 205 

Query: 213 IGYIAPLIILSLCIAQLLSYHLTVLSLSESQAKALGQKTNLISAVFMILVLILSSAAVAI 272 

+ + PLI ++L + +S LT+L+L ES AK LGQ +1 + +1+ +IL+ AVAI 
Sbjct: 206 LKWAIPLIGIALFIILTISKQLTILNLGESLAKGLGQNVTMIRGICLIIAMILAGIAVAI 265 

Query: 273 AGS I S F IGLVI PHLMKHFTPHHYRYLLPLCAVSG 306 

AG ++F+GL++PH+ + Y +LPL A+ G 

Sbjct: 266 AGQVAFVGLMVPHIARFLIGTDYAKILPLTALLG 299 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 158/295 (53%) , Positives = 214/295 (71%) , Gaps = 1/295 (0%) 

Query: 6 KKLVQKNKSNHFWLVFFITLILFLIGCYASLRFGAINFKTSDLITVLKNPLKNSNAQDVI 65 

KK KS+ FWLVF + + Y LRFGA+ DL ++L +N + +V+ 

Sbjct: 16 KKTQIITKSHIFWLVFVLLSFSLCVAIYCHLRFGAVALSHQDLNSILFGK-QNGHKANVL 74 

Query: 66 FDIRLPRIIAAILVGAaMSQAGAIMC^VTRNAIADTOLLGINAGAGLALVVAYAFLGSMH 125 

IRLPR+ A L G+A++ +G IMQ +TRN IA+PGLLGINAGAGLALV+AYAF+ +H 
Sbjct: 75 BAIRLPRLFGATLTGSALAVSGTIMQAITRNPIAEPC-LLGINAGAGLALVLAYAFVPHLH 134 

Query: 126 YSTILIVCLLGSVISCLLVFTLSYTKQKGYHQLRLILAGAMISTLFTSVGQVVTLYFKLN 185 

YS I+++ LLGS ++ LVF LSY KGYHQLRL+LAGAM+S L +++GQ +T Y+ L 
Sbjct: 135 YSLIILLSLIiGSSIjAATLVFGLSYQSGKGYHQLRLvT^GAlWSILLSALGOJSITNYYHLA 194 



Query: 186 RTVIGWQAGGLSQINWKMLIIIAPIIILGLLISQLLAHQLTILSLNESVAKALGQKTQLM 245 



WO 02/34771 



PCT/GB01/04789 



VIGWQAGGL +NW+M+ IAP+IIL L ++QLL++ LT+LSL+ES AKALGQKT L+ 
Sbjct: 195 NAVIGWQAGGLVGVNWQMIGYIAPLIILSLCLAQLLSYHLTVLSLSESQAKRLGQKTNLI 254 

Query: 246 TAFLLLIVLFLSASSVALIGWSFIGLIIPHFIKLFIPKDYRLLLPLIGFSGATF 300 
5 +A +++VL LS+++VA+ G++SFIGL+IPH +K F P YR LLPL SGA+F 

Sbjct: 255 SAVFM I hVhlhS S AAVAI AGS I S FIGLVI PHLMKHFTPHH YRYLLPLCAVSGAS F 309 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 1189 

A DNA sequence (GBSxl265) was identified in S.agalactiae <SEQ ID 3701> which encodes the amino 
acid sequence <SEQ ID 3702>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

15 

Final Results 

bacterial cytoplasm Certainty=0 . 1492 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

20 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

25 Example 1190 

A DNA sequence (GBSxl266) was identified in S.agalactiae <SEQ ID 3703> which encodes the amino 
acid sequence <SEQ ID 3704>. This protein is predicted to be ferrichrome transport permease (permease). 
Analysis of this protein sequence reveals the following: 

Possible site: 30 

30 





have a cleavable I 


I-term signal seq. 
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Likelihood = -5 
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INTEGRAL 


Likelihood = -5 


47 
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174) 


INTEGRAL 


Likelihood = -4 
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INTEGRAL 


Likelihood = -3 


35 


Transmembrane 


91 


- 107 


90 
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Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98155 GB:AF251216 FhuG [Staphylococcus aureus] 
Identities = 122/334 (36%), Positives = 208/334 (61%), Gaps - 3/334 (0%) 

Query: 1 MIQKNKAPFVLISSVIILLLLILV SISLGYANTSVIDVLKLISGKSDDAFLFIITNI 57 

MI N LI+ + +LL L SI+ GNV K + G+D I+ + 

Sbjct: 1 MISSNNKRRQLIALAVFSILLFLGCTOSITSGEYIJ-PVERFFKTLIGQGDAIDELILLDF 60 
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Sbjct: SI RLPRMMITILAGAALSISGAIVQSVTKNPIAEPGILGINAGGGFAIALFIAIGKINADNF 120 

Query: 118 LYFLPLFAMFGGLVTIFLIYLMSYRENHNISPTRLIVTGIGISTIISGVMILIISQSNNQ 177 

+Y LPL ++ GG+ T +I++ S+ +N ++P +++ G+G+ T + G I I+S+ +++ 
Sbjct: 121 VYVLPLISILGGITTALIIFIFSFNKNEGVTPASMVLIGVGLQTALYGGSITIMSKFDDK 180 

Query: 178 KMDMIVEWLSGKITISSWTTIITFIPILILLWGLAYSRSRHLNIMNLNEQTAIiALGLHLK 237 

+ D I W +G I W +1 F+P ++++ +S LNI++ + A LG+ L 

Sbjct: 181 QSDFIAAWFAGNIWGDEWPFVIAFLPWVLIIIPYLLFKSHTIJIIIHTGDNIARGLGVRLS 240 

Query: 238 KERIYTLMLTSSLAAISWLIGKITFIGLLAGHLSRRLLGKNHKIILPSCLLIGAIILLV 297 

+ER+ + L++ +V + G+I+FIGL+ H+++R++G H++ LP +L+GA +L++ 
Sbjct: 241 RERLILFFIAVMLSSAAVAVAGSISFIGLMGPHIAKRIVGPRHQLFLPIAILVGACLLVI 300 

Query: 298 SDTIGRLLLVGTGI PTGLWSI IGAPYFLWLMTK 331 

+DTIG+++L G+P G+W+IIGAPYFL+LM K 
Sbjct: 301 ADTIGKIVLQPGGVPAGI WAI IGAPYFLYLMYK 334 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1939> which encodes the amino acid 
sequence <SEQ ID 1940>. Analysis of this protein sequence reveals the following: 

Possible site: 37 



>>> Seems to have no N- terminal signal sequence 
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Final Results 

bacterial membrane — Certainty=0 . 5373 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 153/322 (47%) , Positives = 229/322 (70%) , Gaps = 1/322 (0%) 



Query: 


11 


LISSVIILLLLIL-VSISLGYANTSVIDVLKLISGKSDDAFLFIITNIRLPRI IVCIFGG 








L +S+I+LL+ ++ +++SLG +4 S +D++ + GKS A FI+ NIRLPRI+ GG 




Sbjct: 




LYTSLILLLVSLMGLALSLGESHLSFLDLVHVFLGKSSHAISFIVINIRLPRIIAACLGG 






70 


ASLGIAGLLLQTLTKNPLADSGILGINAGAGLVIALTIGTFNVSNPTILYFLPLFAMFGG 








SL ++GLLLQ LT+NPLADSG+LGI GAG+ +A+ + I ++LPLFAM G 




Sbj ct : 


82 


GSLALSGLLLQRLTRNPIADSGVLGITIGAGISLAIWSFSFFEQAHISHYLPLFAMLGA 






130 


LVTIFLIYLMSYRRNHNISPTRLIVTGIGISTIISGVMILIISQSNNQKMDMIVEWLSGK 


189 






+VT F +Y +S + I PTRLI+TG+ ++T++S +M+ ++ N K+D+++ WLSG+ 




Sbjct: 


142 


IVTTFSVYWLSLTKQGQIDPTRLILTGVAVTTMLSSLMVALVGHINRYKVDLVINWLSGQ 


201 


Query: 


190 


ITI SSWTTI ITFI PILILLWGIAYSRSRHLNI^TJNEQTALALGLHLKKERI YTLMLTSS 


249 






+ W T+ P+L+ W L YS++ LNIM L + TA+ LGL L ++R L+L + 




Sbjct: 


202 


LIGDDWPTLSVIAPLLLCFWLLTYSQAHFMIMGLADNTAIGLGLPLNRKRRLILVLAAG 


261 




250 


LAAISVVLIGNITFIGLIAGHLSRRLLGNNHKIILPSCLLIGAIILLVSDTIGRLLLVGT 


309 






L A+SV+L+GNI+FIGL+AGH S L+G+NHKI +P +LIG I +LLV+DT+GR+ LVG+ 




Sbjct: 


262 


LGALSVLLVGNISFIGLIAGHFSTYLVGSNHKITIPISILIGMILLLVADTVGRVYLVGS 


321 


Query: 


310 


GIPTGLWSI IGAPYFLWLMTK 331 








I TG++VS+IGAPYFL+LM K 




Sbjct: 


322 


NIQTGILVSLIGAPYFLYLMAK 343 





There is also homology to SEQ ID 396. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1191 

A DNA sequence (GBSxl267) was identified in S.agalactiae <SEQ ID 3705> which encodes the amino 
5 acid sequence <SEQ ID 3706>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 .3785 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AAC05779 GB:AF051356 unknown [Streptococcus mutans] 

Identities = 49/93 (52%) , Positives = 63/93 (57%) 

Query: 1 MILTFNPGKLERQEFFKELINYLWIHDDVTLRKIKSHFTDYSKIDRLLEEYINHGYIIjRQ 60 
MI +N KL RQ FF +LINYL IHDDVTLR+IK +F D +4R +E+Y+ GY+LR+ 
20 Sbjct: 1 MIKIYNGDKLTRQPFFIKLINYLQIHDDVTLRQIKRNFADTEHLERSIEDYVQAGYVLRE 60 



Query: 61 NKRYSLNLPFLSSLDGLVLDDLVFIDSDSQIYQ 93 

NKY L +LDGL LD +F+D S IYQ 

Sbjct: 61 NKHYYNAFELLENLDGLTLDSQ I FVDDQSS IYQ 93 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3707> which encodes the amino acid 
sequence <SEQ ID 3708>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3447 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 108/212 (50%) , Positives = 143/212 (66%) 



Sbjct: 1 



Query: 61 NKEYSimPFLSSLDGLVLDDLVFIDSDSQIYQLLQKRKFVTOLDNPTNHLVFVEETDFE 120 

NKRY +NLP +SS L LD ++F+D+ S +Y+ + F T L N TN ++ E+T+ 
Sbjct: 61 NKRYGINLPLVSSDQQLALDTMLFTOTCSAMYF^ 120 

Query: 121 RM'LTLSNYFYKLTNGYPLSREQKKLYQLLGDVNSEYALKYMSSFILKFLRKDSVKQKRT 180 

R+ LTL+NYFY+L G S EQ LY LLGDVN EYALKYM++F+LKF RKD V QKR 
Sbjct: 121 RDDLTLANYFYRLKRGEKPSAEQMDLYDLLGDVNQEYALKYMTTFLLKFTRKDFVMQKRP 180 

Query: 181 VIFIQALELLGYISLNQDTTYRLNAKLDVEAL 212 

IF++AL LGY+ + TTY+L LD E+L 
Sbjct: 181 DIFVEALVTLGYLKQVEPTTYQLLMTLDKESL 212 



55 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1192 

A DNA sequence (GBSxl268) was identified in S.agalactiae <SEQ ID 3709> which encodes the amino 
acid sequence <SEQ ID 3710>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0824 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB39104 GB:U57759 intrageneric coaggregation-relevant adhesin 
[Streptococcus gordonii] 
15 Identities = 261/311 (83%) , Positives = 283/311 (90%) 



Query: 1 MSKILVFGHQNPDSDAISSSVAFAYLAKEAWGLDTEAVALGTPNEETAYVLDYFGVQAPR 60 

MSKILVFGHQNPDSDAIGSS AFAYLA+EA+GLDTEAVALG PNEETA4VLDYFGV APR 
Sbjct: 1 MSKILVFGHQNPDSDAIGSSYAFAYLAREAYGLDTEAVALGEPNEETAFVLDYFGVAAPR 60 



Sbjct: 61 VITSAKAEGAEQVILTDHNEFQQSVADIAEVEVYGWDHHRVANFETANPLYMRLEPVGS 120 

Query: 121 ASSIVYRMFKENGVSVPKELAGLLLSGLISDTLLLKSPTTHASDIPVAKELAELAGVNLE 180 

ASSIVYRMFKE+ V+V KE+AGL+LSGLISDTLLLKSPTTH +D +A ELAELAGVNLE 
Sbjct: 121 ASSIVYRMFKEHSVAVSKEIAGLMLSGLISDTLLLKSPTTHPTDKAIAPELAELAGvNLE 180 

Query: 181 EYGLEMLKAGTNLSSKTAAELIDIDAKTFELNGEAWVAQVNTvDINDILARQEEIEVAI 240 

EYGL MLKAGTNL+SK+A ELIDIDAKTFELNG VRVAQVNTvDI ++L RQ EIE AI 
Sbjct: 181 EYGLAMLKAGTNLASKSAEELIDIDAKTFELNGM^TWAQ WTVE IAEVLERQAE I EAAI 240 

Query: 241 QEAIVTEGYSDFVLMITDIWSNSEILALGSNMAKVFJiAFEFTLENNHAFLAGAVSRKKQ 300 

++AI GYSDFVLMITDI+NSNSEILA+GSNK KVEAAF F LENNHAFLAGAVSRKKQ 
Sbjct: 241 EKAIADNGYSDFVLMITDIINSNSEIIAIGSNMDKVEAAFNFVLENNHAFLAGAVSRKKQ 300 

Query: 301 WPQLTESYNA 311 

WPQLTES+NA 
Sbjct: 301 WPQLTESFNA 311 

A related DNA sequence was identified in S.pyogenes <SEQ ID 371 1> which encodes the amino acid 
sequence <SEQ ID 3712>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.02 Transmembrane 141 - 157 ( 141 - 157) 



Final Results 

bacterial membrane Certainty=0. 1808 (Affirmative) < succi 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9103> which encodes the amino acid sequence 
<SEQ ID 9104>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.02 Transmembrane 139 - 155 ( 139 - 155) 



Final Results 

bacterial membrane --- Certainty= 0. 181 (Affirmative) < succi 

60 bacterial outside --- Certainty^ 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= o. 000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 253/311 (81%) , Positives = 283/311 (90%) 

Query: 1 MSKILVFGHQNPDSDAIGSSVAFAYLAKEAWGLDTEAVALGTPNEETAYVLDYFGVQAPR 60 

MSKILVFGHQNPD+DAI SS AF- YL+++A+GLDTE VALGTPNEETA+ LDYFGV+APR 
Sbjct: 3 MSKILVFGHQNPDTnAIASSYAFDYLSQKAFGLDTEWALGTPNEETAFALDYFGVEAPR 62 

Query: 61 VAffiSAKAEGVETVlLTDHNEFQQSISDIKDVTVYGVVTJHHRVANFETANPLYMRLEPVGS 120 

WESAKA+G E VILTDHNEFQQSI+DI++V VYGWDHHRVANFETANPLYMR+EPVGS 
Sbjct: 63 WESAKAQGSEQVILTDHNEFQQSIADIREVEVYGVVDHHRVANFETANPLYMRVEPVGS 122 

Query: 121 ASSIVYRMFKENGVSVPKEIAGLLLSGLISDTLLLKSPTTHASDIPVAKELAELAGVNLE 180 

ASSIVYRMFKENG+ VPK +AG+LLSGLISDTLL1KSFTTH SD VA+EIAELA VNLE 
Sbjct: 123 ASSIWRMFKENGIEVPKAIAGMLLSGLISDTLLLKSPTTHVSDHLVAEELAELAEVNLE 182 

Query: 181 EYGLEMLKAGTNLSSKTAAELIDIDAKTFElJS _ GFATOVAQVrrrVDINDILARQEElEVAI 240 

+YG+ +LKAGTNL+SK+ ELI IDAKTFELNG AVRVAQVNTVDI ++L RQE IE AI 
Sbjct: 183 DYGMALLKAGTraASKSEVELIGIDAKTFELKGNAVRVAQVNTVDIAEVLERQEAIEAAI 242 

Query: 241 QEAIVTEGYSDFVIjMITDIWSNSEILALGSNMAKVEAAFEFTLENNHAFLAGAVSRKKQ 300 

++A+ EGYSDFVLMITDIVNSNSEILA+G+NM KVEAAF FTL+NNHAFLAGAVSRKKQ 
Sbjct: 243 KDAMAAEGYSDFVXMITDIWSNSEIIAIGANMDKVEAAFNFTLDNtmAFLAGAVSRKKQ 302 

Query: 301 WPQLTESYNA 311 

WPQLTES+ A 
Sbjct: 303 WPQLTESFGA 313 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1193 

A DNA sequence (GBSxl269) was identified in S.agalactiae <SEQ ID 3713> which encodes the amino 

acid sequence <SEQ ID 3714>. Analysis of this protein sequence reveals the following: 

Possible site: 2 0 
35 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2769 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC05773 GB:AF051356 pyruvate -formate lyase activating enzyme 
[Streptococcus mutans] 
Identities = 184/260 (70%) , Positives - 217/260 (82%) 

Query: 3 EIDYKKVTGMIHSTESFGSVIIGPGIRFIIFMCGCKI^CQYCHNPDTWENIETNNSKERTVE 62 

++DY+KVTG+++STESFGSVDGPGIRF++FMQGC+MRCQYCHNPDTW M+ + + ERT 
Sbjct: 4 KVDYEIOTGLVNSTESFGSVDGPGIRFWFMQGCQMRCQYCHNPDTWAMKNDRATERTAG 63 

Query: 63 DVLKEALRYKHFWGKIlGGITVSGGEAMLQIDFITAIiFIEAKKLGIHTTLDTCGFAYRATP 122 

DV KEALR+K FWG GGITVSGGEA LQ+DF+ ALF AK+ GIHTTLDTC +R TP 
Sbjct: 64 DVFKFALRFKDFWGDTGGITVSGGEATLQMDFBIALFSLAKEKGIHTTLDTCALTFRNTP 123 

Query: 123 EYHAILEKLLDOTDLV1LDLKEIDSEQHKIVTRQSNK1TILQFARYLSDRGTPVWIRHVLV 182 

+Y EKL+ VTDLVLLD+KEI+ +QHKIVT SNK IL ARYLSD G PVWIRHVLV 
Sbjct: 124 KYLEKYEKLMAVTDLvLLDIKEINPDQHKXVTGHSNKTIIACARYLSDIGKPWIRHVLV 183 

Query: 183 PGLTDIDDHLKRLGEFVQTLDNVDKFE\1PYHTMGEFKWRELGIPYPLAGvKPPTPERVK 242 

PGLTD D+ L +LGE+V+TL NV +FE+LPYHTMGEFKWRELGIPYPL GVKPPTP+RV+ 
Sbjct: 184 PGLTDRDEDLIKLGEYVKTLKNVQRFEILPYHTMGEFBCWRELGIPYPLEGVKPPTPDRVR 243 
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Query: 243 NAKDIMKTESYTEYLKRIQN 262 

NAK +M TE+Y EY KRI + 
Sbjct: 244 NAKKLMHTETYEEYKKRINH 263 

5 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3715> which encodes the amino acid 
sequence <SEQ ID 3716>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4614 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 223/260 (85%) , Positives = 239/260 (91%) 

Query: 1 MAEIDYKKWGMIHSTESFGSVnGPGIRFIIFMCGCKNRCQYCHNPDTWEMETNNSKERT 60 
20 M E DY +VTGM+HSTESFGSVDGPGIRFIIF+^CK+RCQYCHNPDTWEMETNNSK RT 

Sbjct: 25 MTEKDYGQVTGMVHSTESFGSVDGPGIRFIIFLQGCKLRCQYCHNPDTWEMETNNSKIRT 84 

Query: 61 WDVIim^YKHFWGKDGGITVSGGEAMLQIDFITALFIEAKKLGlHTTLDTCGFAYRA 120 
V DVLKEAL+YKHFWGK GGITVSGGEAMLQIDFITALFIEAKKLGIHTTLDTCGF YR 
25 Sbjct: 85 VlTOvLKEffiliQYKHFWGKKGGITVSGGEAMLQIDFITALFIEAKKLGIHTTLDTCGFTYRP 144 

Query: 121 TPEYHAILEKLLDVTDLVTjLDLKEIDSEQHKIVTRQSNKNILQFARYLSDRGTPVWIRHV 180 

TPEYH +L+ LL VTDL+LLDLKEID +QHKIVTRQ NKNILQFARYLSD+ PVWIRHV 
Sbjct: 145 TPEYHQVI^DNLIiAVTDLIIjLDLKEIDEKQHKIVTRQPNKNILQFARYLSDKQIPVWIRHV 204 

30 

Query: 181 LVPGLTDIDDHLKRLGEFVQTLDHVDKFEVLPYHTMGEFKl'JRELGIPYPLAGVKPPTPKK 240 

LVPGLTDIDDHL RLGEFV+TL NVDKFEVLPXHTMGEFKMRELGIPY L GVKPPT ER 
Sbjct: 205 LVPGLTDIDDHLTRLGEFVKTLKNVDKFEVLPYHTMGEFKWRELGIPYQLEGVKPPTKER 264 

35 Query: 241 VKNAKDIMKTESYTEYLKRI 260 

V+NAK++M+TESYTEY+ RI 
Sbjct: 265 VQNAKNLMQTESYTEYMNRI 284 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
40 vaccines or diagnostics. 

Example 1194 

A DNA sequence (GBSxl270) was identified in S.agalactiae <SEQ ID 3717> which encodes the amino 
acid sequence <SEQ ID 371 8>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.06 Transmembrane 105 - 121 ( 103 - 126) 
INTEGRAL Likelihood = -5.57 Transmembrane 137 - 153 ( 136 - 162) 

Final Results 

bacterial membrane Certainty=0. 3824 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

55 >GP:AAC05772 GB:AF051356 putative hemolysin [Streptococcus mutans] 

Identities = 347/445 (77%), Positives = 406/445 (90%), Gaps = 1/445 (0%) 

Query: 1 MQDPGSQSLLLQFVILLILTLENAFFSASEMALVSLNRSKVEQKAEEGDKRYRRLLDVLE 60 
M+DPGSQSL+LQF++LLILTL NAFFSA+EMALVSLNR++VEQKAEEG+K+Y RLL VLE 
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15 
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50 
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Sbjct: 1 MEDPGSQSLILQFLLLLILTLCNAFFSATEMALVSIjNRARVEQKREEGEKKyiRLLKVLE 60 

Query: 51 NPIMFLSTIQVGITFISLLQGASLSASLGHVISGWLGNSATARTAGSIIALIFLTYVSIV 120 
NPNNFLSTIQVGIT I+LL GASL+ SLG H- W GNSATARTAGS+I4L FLTY+SIV 
5 Sbjct: SI NPNNFLSTIQVGITLITLLSGASLADSLGREIAVWFGNSATARTAGSLISLAFLTYISIV 120 

Query: 121 LGELYPKRIAMNLKDRLAI VSAPI I 1 FLGKI VS PFVWLLSASTNLLSRITPMTFDDADEK 180 

LGELYPKRIAMNLK+ LA++SAP+IIFLGK+VSPFVWLLS STNLLSR+TPMTFDDADEK 
Sbjct: 121 LGELYPKRIAMNLKENIAVLSAPVIIFLGKWSPFVWLLSVSTNLLSRLTPMTFDDADEK 180 

10 

Query: 181 MTRDEIEYMLTNSEETLFJffiEIE^QGIFSLDEMMAREVlWPRTDAFMIDINWDAQSNIE 240 

MTRDEIEYMLTNSEETL+A+EIEMLQG+FSLDE+MAREVMVPRTDAFM+DIN+D+ 1+ 
Sbjct: 181 MTRDEIEYMLTNSEETLDADEIEMLQGVFSLD3LMAREVMVPRTDAFMVDINDDSSDIIQ 240 

15 Query: 241 GILSQNFSRVPVFDDDKDRWGVLHTKRLLEAGFKTGFDTIDLRKILQEPLFVPETIFVD 300 

IL++ FSR+PV+DDDKD+++G++HTK LL AGFK GFD I+LR+ILQEPLFVPETI V+ 
Sbjct: 241 TIL^RFSRIPVYDDDKDKIIGIIHTKNLLNAGFKEGFDHII^RRILQEPLFVPETIVVN 300 

Query: 301 DLLKALRNTQNQMAILLDEYGGVAGLVTLEDLLEEIVGEIDDETDTAEQFVREIDENIYI 360 
20 DLL AL+NTQNQMAILLDEYGGVAGLVTLEDLLEEIVGEIDDETD VREI +N YI 

Sbjct: 301 DLLTALKNTQNQMAILLDEYGGVAGLVT] J ED] J ) J EEIVGEIDDETDKTAISWEIADNTYI 360 

Query: 361 VLGTMTUJEFNDYFETELESDDVDTIAGYYLTGVGSIPNQEEKVAYEVDSKDKHITLIND 420 
VLGTMTLN+FN+YFET+LESD+VDTIAG+YLTGVG+IP+QEEK +EV+S KH+ LIND 
25 Sbjct: 361 VLGTMTIMDFNEYFETDLESDNVDTIAGFYLTGVGTIPSQEEKEHFEVESNGKHLELIND 420 

Query: 421 KVKDGRITKLKVLLSDIEQ-NIEKD 444 

KVKDGR+TKLK+L+S++E+ EKD 
Sbjct: 421 KVKDGRVTKLKILVSEVEEKEDEKD 445 

30 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3719> which encodes the amino acid 

sequence <SEQ ID 3720>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

35 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.76 
INTEGRAL Likelihood = -5.57 
INTEGRAL Likelihood = -3.19 

40 Final Results 

bacterial membrane Certainty=0 . 4503 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the databases: 



Query: 14 MEDPVSQSLVIQFLLLWLTLLNAFFSASEMALVSLNRSRVEQKAADGDKKYARLLRVLE 73 

MEDP SQSL++QFLLL++LTL NAFFSA+EMALVSLNR+RVEQKA +G+ICKY RLL+VLE 
Sbjct: 1 MEDPGSQSLILQFLLLLILTLCNAFFSATEMALVSljNRARVEQKAEEGEKKYIRLLKVLE 60 

Query: 74 EPNHFLSTIQVGITFISLLSGASLSASLGKVISGWLGNSATARTAGTIISLVFLTYVSIV 133 

PN+FLSTIQVGIT I+LLSGASL+ SLG+ 1+ W GNSATARTAG++ISL FLTY+SIV 
Sbjct: 61 NPNNFLSTIQVGITLITLLSGASLADSLGREIAVWFGNSATARTAGSLISLAFLTYISIV 120 

Query: 134 LGELYPKRIAMNLKDKLAIVSAPIIIGLGRLVSPFVI'irLLSASTNLLSRLTPMTFDDADEQ 193 

LGELYPKRIAMNLK+ LA++SAP+II LG++VSPFVWLLS STNLLSRLTPMTFDDADE+ 
Sbjct: 121 LGELYPKRIAMNLKENLAVLSAPVIIFLGKVVSPFvWLLSVSTNLLSRLTPMTFDDADEK 180 

Query: 194 MTRDEIEYMLSKSEATLDAKEIEMLQGVFSLDEMMAREVMVPRTDAFMIDINDDPLEN1Q 253 

MTRDEIEYML+ SE TLDA+EIEMLQGVFSLDE+MAREVMVPRTDAFM+DINDD + IQ 
Sbjct: 181 MTRDEIEYMLTNSEETLDADEIEMLQGVFSLDELMAREVMVPRTDAFMVDINDDSSD1IQ 240 

Query: 254 EILKQSFSRIPVYDVDKDKIIGLIHTKRLLESGFRQGFDQINMRKMLQEPLFVPETIFVD 313 
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IL + FSRIPVYD DKDKIIG+IHTK LL +GF++GFD IN+R++LQEPLFVPETI V+ 
Sbjct: 241 TILNERFSRIPVYDDDKDKI IGI IHTKNLLMAGFKEGFDHINLRRILQEPLFVPETIWN 300 

Query: 314 DLLRQLPJTOQNQ>1AILLDEYGGVAGLVTLEDI.LEEIVGEIDDETDKAEQFVHEIGDNTYI 373 

DLL L+NTQNQMAI LLDEYGGVAGLVTLEDLLEE I VGEI DDETDK V EI DNTYI 
Sbjct: 301 DLLTALKNTQNQMAI LLDEYGGVAGLVTLEDLLEE I VGEI DDETDKTAI SVRE IADNTYI 360 

Query: 374 WGTMTLMFNDYFDTELESDDVDTIAGFYLTGIGTIPSQEQKEAYEIDNKDKHLVLIND 433 

V+GTMTLN+FN+YF+T+LESD+VDTIAGFYLTG+GTIPSQE+KE +E+++ KHL LIND 
Sbjct: 361 VLGTMTLNDFNEYFETDLESDNVDTIAGFYLTGVGTIPSQEEKEHFEVESNGKHLELIND 420 

Query: 434 KVKDGRITKLKLILSNIEQIIEE 456 

KVKDGR+TKLK+++S +E+ +E 
Sbjct: 421 KVKDGRVTKLKI LVSEVEEKEDE 443 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 364/444 (81%) , Positives = 417/444 (92%) 

Query: 1 MQDPGSQSLLLQFVILLILTLFNAFFSASEt4ALVSLNRSKVEQKAEEGDKRYRRLLDVLE 60 

M+DP SQSL++QF++L++LTL NAFFSASEMALVSLNRS+VEQKA +GDK+Y RLL VLE 
Sbjct: 14 MEDPVSQSLVIQFLLLWLTLLNAFFSASEMALVSLNRSRVEQKAADGDKKYARLLRVLE 73 

Query: 61 NPNNFLSTIQVGITFISLLQGASLSASLGHVISGWLGNSATARTAGSIIALIFLTYVSIV 120 

PN+FLSTIQVGITFISLL GASLSAELG VI EGW1GNSATARTAG+ 1 1 +L+FLTYVSI V 
Sbjct: 74 EPNHFLSTIQVGITFISLLSGASLSASLGKVISGWLGNSATARTAGTIISLVFLTYVSIV 133 

Query: 121 LGELYPKRIAMNLKDRLAIVSAPI 1 1 FLGKIVSPFVWLLSASTNLLSRITPMTFDDADEK 180 

LGELYPKRIAMNLKD+LAIVSAPIII LG++VSPFVWLLSASTNLLSR+TPMTFDDADE+ 
Sbjct: 134 LGELYPKRIA^LKDKLAIVSAPIIIGLGRLVSPFVWLLSASTMLLSRLTPMTFDrJADEQ 193 

Query: 181 MTRDEIEYMLTNSEETLEAEEIEMLQGIFSLDEMMAREVMVPRTDAFMIDINNDAQSNIE 240 

MTRDEIEYML+ SE TL+AEEIEMLQG+FSLDEMMAREVMVPRTDAFMIDIN+D NI+ 
Sbjct: 194 MTRDEIE YMLS KSEATLDAEE IEMLQGVFSLDEMMAREVMVPRTDAFM I D INDDPLENI Q 253 

Query: 241 GILSQNFSRVPVFDDDKDRWGVLHTKRLLEAGFKTGFDTIDLRKILQEPLFVPETIFVD 300 

IL Q+FSR+PV+D DKD+++G++HTKRLLE+GF+ GFD I++RK+LQEPLFVPETIFVD 
Sbjct: 254 EILKQSFSRIPVYDVDKDKIIGLIHTKRLLESGFRQGFDQINMRKMLQEPLFVPETIFVD 313 

Query: 301 DLLKALRWTQNQMAILLDEYGGVAGLVTLEDLLEEIVGEIDDETDTAEQFVREIDENIYI 360 

DLL+ LRNTQNQMAILLDEYGGVAGLVTLEDLLEEIVGEIDDETD AEQFV EI +N YI 
Sbjct: 314 DLLRQLRNTQNQMAILLDEYGGVAGLVTLEDLLEEIVGEIDDETDKAEQFVHEIGDNTYI 3 73 

Query: 361 VLGTMTIJJEFNDYFETELESDDVDTIAGYYLTGVGSIPNQEEKVAYEVDSKDKHITLIND 420 

V+GTMTLNEFNDYF+TELESDDVDTIAG+YLTG+G+ 1 P+QE+K AYE+D+KDKH+ LIND 
Sbjct: 374 WGTMTIjNEFNDYFDTELESDDVDTIAC-FYLTGIGTIPSQEQKEAYEIDNKDKHLVLIND 433 

Query: 421 KVKDGRI TKLKVLLSD IEQNI EKD 444 

KVKDGRITKLK++LS+IEQ IE+D 
Sbjct: 434 KVKDGRITKLKLILSNIEQI IEED 457 

SEQ ID 3718 (GBS70d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 120 (lane 8-10; MW 65kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 120 (lane 11 & 12; MW 44kDa) and in 
Figure 179 (lane 5; MW 35kDa). 

GBS70d-His was purified as shown in Figure 231, lane 9-10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1195 

A DNA sequence (GBSxl271) was identified in S.agalacttae <SEQ ID 3721> which encodes the amino 
acid sequence <SEQ ID 3722>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1212 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB84230 GB:AL162754 hypothetical protein NMA0960 [Neisseria 
meningitidis Z2491] 
Identities = 80/1B4 (43%) , Positives = 119/184 (64%) , Gaps = 3/184 (1%) 

Query: 1 MIKRPIHLSHDFLAEVIDKEAITLDATMGNGNDTVFLAKSSK KVYAFDIQEEAIAKT 57 

++K + +H L + + + LD T GNG+DT+FLA+++ KV+AFDIQ +A+ T 
Sbjct: 2 LLKNILPFAHCLLRQALPEGGNALDGTAGNGHDTLFLAQTAGIRGKVWAFDIQPQALNNT 61 

Query: 58 KAKLTEQGI SNAEL I LDGHENLEQYvHTPLRAAIFNLGYLPSADKTVI TKPHTTI KAI KN 117 

+ +L E G SN LILDGHENL+QY+ PL AAIFN G+LP DK++ T+ T+I A+ 
Sbjct: 62 RCRLQEAGYSNVRLILDGHENLKQYIPKPLDAAIFNFGWLPGGDKSLTTRTETSIAALSA 121 

Query: 118 VLDILEVGGRLSLMvYYGHDGGKSEKDAVIAFVEQLPQNNFATMLYQPLNQVNTPPFLIM 177 
L +L+ G L ++Y GH+ GK E +A+ + + LPQ FA + Y N+ N+PP+L+ 

Sbjct: 122 2 

Query: 178 VEKL 181 
EKL 

Sbjct: 182 FEKL 185 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3723> which encodes the amino acid 
sequence <SEQ ID 3724>. Analysis of this protein sequence reveals the following: 

35 Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 127 - 143 ( 123 - 143) 



Final Results 

40 bacterial membrane Certainty=0 . 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related sequence was also identified in GAS <SEQ ID 9101> which encodes the amino acid sequence 
<SEQ ID 9102>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 118 - 134 ( 114 - 134) 

Final Results 

bacterial membrane Certainty= 0 . 157 (Affirmative) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 



55 An alignment of the GAS and GBS proteins is shown below. 

Identities = 124/184 (67%) , Positives = 156/184 (84%) 



Query: 1 MIKRPIHLSHDFLAEVIDKEAITLDATMGNGNDTVTLAKSSKKVYAFDIQEEAIAKTICAK 60 
M+KRPIHLSHDFLAEV+DK ++ +DATMGNGNDT FLA+ +KKVYAFD+QE+AI KT + 
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Sbjct: 10 MLKRPIHLSHDFLAEVVDKSS\A/VDATMGNGMiaFIAQlAKKVYAFDVQEQAIRKTSER 69 

Query: 61 LTEQ6ISNAELILDGHENLEQVVHTPLRAAIFM^GYLPSADKWITKPHTTII<AIKNVLD 120 

L + G+SNAELIL GHE ++QYV P+RARIFNLGYLPSADK++IT P+TT++A+ +h 
Sbjct: 70 LAQLGLSNAELI LAGHEA vTJQYVTEPVRAAI FNLGYLPSADKS 1 1 TLPNTTLQALSKLLT 129 

Query: 121 ILEVGGRLSLIWYYGHDGGKSEI<DAVIAFVEQL,PQKNFATMLYQPLNQVHTPPFLIMVEK 180 

+L VGGR+++MVYYGHDGG EKDA++ FV+QL Q + MLYQPLNQVNTPPFLIM+EK 
Sbjct: 130 LLMVGGRIAIMVYYGHDGGSLEKDALLDFVKQLDQRKVSAMLYQPLNQVNTPPFLIMLEK 189 

Query: 181 LQSY 184 
L + 

Sbjct: 190 LADF 193 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1196 

A DNA sequence (GBSxl272) was identified in S.agalactiae <SEQ ID 3725> which encodes the amino 
acid sequence <SEQ ID 3726>. Analysis of this protein sequence reveals the following: 
Possible site: 51 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1948 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00380 GB:AF008220 YtqA [Bacillus subtilis] 
Identities = 161/302 (53%) , Positives = 220/302 (72%) , Gaps = 4/302 (1%) 

Query: 2 KKRYRAINDYYRELFGEKIFKLPIDAGFDCPNRDGTVARGGCTFCTVSGSGDAIVAPEAP 61 
+KRY +N + RE FG K+FK+ +D GFDCPNRDGTVA GGCTFC+ +GSGD 



Query: 62 IREQFYKEIDFMHRKWPEVNKYLWFQNFTNTHAKLEIIKERYEQAINEPGVIGINIGTR 121 

+ QF+ + MH KW + KY4 YFQ FTNTHA +E+++E++E + V+GI+I TR 
Sbjct: 73 LITQFHDIKNRMHEKWKD-GKY1AYFQAFTNTHAPVEVLREKFESVLALDDWGISIATR 131 

Query: 122 PDCLPDETIYYLAELSERMHVTLELGLQTTYEaTSALINRAHSYDLYKKTVKRIRELAPK 181 

PDCLPD+ + YLAEL+ER ++ +ELGLQT +E T+ LINRAH ++ Y + V ++R+ 
Sbjct: 132 PDCLPDDVVDYLAELNERTYLWVELGLQTVHERTALLINRAHDFNCY\'EGVNKLRI<HG-- 189 

Query: 182 vEIVSHLlNGLPGETHDD/MvENVRROTTDNDIQ^IKLHLLHLMTNTRMQRDYHEGRLRLL 241 

+ + SH+INGLP E DMM+E + V D D+QGIK+HLLHL+ T M + Y +G+L L 
Sbjct: 190 IRVCSHIINGLPLEDRD^I^WETAK-AVADDDVC<31KIHI,LHI J LKGTPMVKQYEKGKLEFL 248 

Query: 242 SQEDYISIICDQLEIIPKHIVIHRITGDAPRHMLIGPIWSLNKWEVLNAIDKEMEKRQSY 301 

SQ+DY+ ++CDQLEIIP +++HRITGD P ++ IGPMWS +NKWEVL AI+KE+E R SY 
Sbjct: 249 SQDDYVQLVCDQLEIIPPEMIVHRITGDGPIELMIGPI-1WSVNKWEVLGAINKELENRGSY 308 

Query: 302 QG 303 
QG 

Sbjct: 309 QG 310 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3727> which encodes the amino acid 
sequence <SEQ ID 3728>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 
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- Final Results 

bacterial cytoplasm --■ 
bacterial membrane --■ 
bacterial outside --■ 



Certainty=0 .2023 (Affirmative 
Certainty=0. 0000 (Not Clear) 
Certainty=0. 0000 (Not Clear) , 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 260/307 (84%), Positives = 290/307 (93%), ( 



MKKRYRAINDYYRELFGEKIFKLP:DAGFDCPNREGTVARGGCTFCTVSGSGDAIVAPEA 60 
MKKRY+ +N++YR+LFG K+FK+PIDAGFDCPNRDGTVA GGCTFCTVSGSGDAIVAP4A 
MKKRYQTLNEHYRQLFGAKMFKVPIDAGFDCPNRDGTVAHGGCTFCTVSGSGDAIVAPDA 6 6 



RPDCLPD+TI YLAELSERMHVT+ELGLQTTYE TS LINRAHSYDLYK+TV+R+R 



- IVSHLINGLP ETHDMM+ENVRRCVTDNDIQGIKLHLLHLMTNTRMQRDYHEGRL+L 



LSQ+DY+SI I CDQLEI IPKHI VIHRITGDAPR KLIGPMWSLNKWEVLNAIDKEME+R S 



Query: 


1 
7 


Sbjct: 
Query: 


61 


Sb j ct : 


67 




121 


Sbjct: 


127 


Query: 


181 


Sbjct: 


186 


Query: 


241 


Sbjct: 


246 


Query: 


301 


Sbjct: 


306 



Based on this analysis, it was predicted that these proteins and their epitopes could be u 
vaccines or d 



35 Example 1197 

A DNA sequence (GBSxl273) was identified in S.agalactiae <SEQ ID 3729> which encodes the amino 
acid sequence <SEQ ID 3730>. Analysis of this protein sequence reveals the following: 

Possible sit 



40 



INTEGRAL 
INTEGRAL 
INTEGRAL 



lave an uncleavable 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



-term signal seg 

Transmembrane 10 - 26 ( 6 - 30) 

Transmembrane 93 - 109 ( 87 - 112) 

Transmembrane 163 - 179 ( 161 - 181) 

Transmembrane 189 - 205 ( 185 - 205) 

Transmembrane 58 - 74 ( 58 - 74) 

Transmembrane 130 - 146 ( 130 - 146) 



Final Results 

bacterial membrane Certainty=0. 4927 (Affirmative) < succ: 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 
Sbjct: 



, Gaps = 3/159 (1%) 

ISFDQTIQESVRGQLPNLSTRFFKLITvTGNTVSQIAIAIMSVTFCY- -LKKWYPQARFI 91 
+ FD+ + V+G L T K T IG+T S I ++++ + F Y LK F 
LKFDEDVISLVQGWESPLLTDIMKFFTYIGSTASLIILSLVILFFLYRILKHRLELVLFT 93 
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Sbjct: 94 



-1346- 

/-MVGSPLimMVKLFFQRARPDLHRLIDIGGYSFPSGHAMHAFSLYGILTFLLWRHIT 152 



152 KSIWKLLCQGTIfiLLIFLIGLSRIYLGVHFPTDVLAGFI 190 

++L L+I IG+SRIYLGVH+P+D++AG++ 

153 ARWARILLILFSMLMILSIGISRIYLGVHYPSDIIAGYL 191 



A related DNA sequence was identified in S. pyogenes <SEQ ID 1851> which encodes the amino acid 
sequence <SEQ ID 1852>. Analysis of this protein sequence reveals the following: 



Possible site: 15 

>>> Seems to have an uncleavable 

INTEGRAL Likelihood =-11 

INTEGRAL Likelihood =-10 

INTEGRAL Likelihood = -8 

INTEGRAL Likelihood = -3 

INTEGRAL Likelihood = -2 

INTEGRAL Likelihood = -1.54 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



-term signal seq 



- Certainty=0 . 5522 (Affirmative) 

- Certainty=0 . 0000 (Not Clear) < 

- Certainty=0. 0000 (Not Clear) < : 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 88/197 (44%) , Positives = 134/197 (67%) , ( 



MLSRQNSKLIQAFIAIILFFSLGLVIKYl'JPDTVISFDQTIOESVRGQLPNLSTRFFKLIT 60 
M ++Q LI +F A+++F +G +K++P+ + D TIQ +RG LP + T+FF+ +T 
MTNKQTHFLIASF-ALLIFVIIGYTVKFFPERIjALIiDNTIQAEIRGNLPIVLTQFFRGVT 6 0 



KW +A FI N 1+ I +LKL +QR RP + HLV 







1 




Sbjct: 


2 


30 




61 




Sb j ct : 


61 


35 


Sbjct: 


121 






181 


40 


Sbjct: 


181 



+P+D+LAGF+L +GIL+ 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 1198 

45 A DNA sequence (GBSxl274) was identified in S.agalactiae <SEQ ID 373 1> which encodes tl 
acid sequence <SEQ ID 3732>. Analysis of this protein sequence reveals the following: 
Possible site: 58 



50 





have no N-terminal signal sequence 










INTEGRAL 


Likelihood = -8.44 


Transmembrane 


35 - 


51 


33 


59) 


INTEGRAL 


Likelihood = -5.53 


Transmembrane 


193 - 


209 


179 


211) 


INTEGRAL 


Likelihood = -4.46 


Transmembrane 


64 - 


80 


60 


82) 


INTEGRAL 


Likelihood = -4.09 


Transmembrane 


108 - 


124 


103 


128) 


INTEGRAL 


Likelihood = -2.71 


Transmembrane 


150 - 


166 


148 


166) 


INTEGRAL 


Likelihood = -0.06 


Transmembrane 


174 - 


190 


174 


190) 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



• Certainty=0. 4376 (Affirmative) . 

■ Certainty=0. 0000 (Not Clear) < i 

■ Certainty=0 . 0000 (Not Clear) < t 
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-1347- 

A related GBS nucleic acid sequence <SEQ ID 9977> which encodes amino acid sequence <SEQ ID 9978> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC83944 GB:L47648 putative [Bacillus subtilis] 
5 Identities = 53/186 (28%) , Positives = 109/186 (58%) 



20 





33 


RKMVTIAILSALSFVLM^SFPLIPGAEFLXVDFSILPMLVAFILFDLKSSYGVLLLRSL 


92 






+K+V +++LS+++FVLM+++FP ++LK+DFS +P ++A +++ + ¥ ++++ 




Sbjct: 




KKLVWSMLSSIAFVLMLLNFPFPGLPDYLKIDFSDVPAIIAILIYGPLAGIAVEAIKNV 


63 




93 


LKVIIANRGPETFIGLPMNMVAIALFIASFAi™ 


152 






L+ 1+ +G N +A LF+ A +K SAK + L GT ++T+ M 




Sb j Ct : 


64 


LQYIIQGSMAGVPVGQVANFIAGTLFILPTAFLFKKLNSAKGLAVSLLLGTAAMTILMSI 


123 




153 


nKYVFAIPLYAIFANFDIRTFIGVGNYLLTMVIPFNIVEGILISIVFYLTYVACLPILER 


212 






LNYV +P Y F + + + ++ ++PFN+++GI+I++VF L ++ P +E+ 




Sb j Ct : 


124 


LNYVLILPAYTWFLHSPALSDSALKTAWAGILPFNMIKGIVITWFSLIFIKLKPWIEQ 


183 




213 


YKKTNV 218 




Sbjct: 


184 


QRSAHI 189 





A related DNA sequence was identified in S.pyogenes <SEQ ID 3733> which encodes the amino acid 

sequence <SEQ ID 3734>. Analysis of this protein sequence reveals the following: 

Possible site: 26 ' 
>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.48 Transmembrane 82 - 98 ( 74 - 100) 

INTEGRAL Likelihood = -3.93 Transmembrane 161 - 177 ( 152 - 178) 

Likelihood = -3.61 Transmembrane 108 - 124 ( 107 - 126) 

Likelihood = -3.61 Transmembrane 33 - 49 ( 31 - 50) 



5 Certainty=0. 3590 (Affirmative) < suco 



bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAC83944 GB:L47648 putative [Bacillus subtilis] 
Identities = 46/182 (25%) , Positives = 97/182 (53%) 

Query: 3 KTHKMIMIGILSAI SFLLMLVSFAI IPGAAFLKIEFS 1 1 PVLFGLMIMDLKSAYLILLLR 62 

K K++++ +LS+I+F+LML++F +LKI+FS +P + ++I + + ++ 

Sbjct: 2 KVKKLVWSMLSSIAFVLMLLNFPFPGLPDYLKIDFSDVPAIIAILIYGPIAGIAVEAIK 61 

Query: 63 SLLKLFLNNRGVNDFIGLPMNI IAIALFVTAFALVWNRQKTLSQYVFASLLGTGLLTFGM 122 

++L+ + +G N IA LF+ A ++ + + + LLGT +T M 

Sbjct: 62 NVLQYIIQGSMAGVPVGQVANFIAGTLFILPTAFLFICKLNSAKGLAVSLLLGTAAMTILM 121 



Query: 123 VVI^TFAIPLYAIFANIDIPAYIGVTKYMMTWIPFNLVEGLIFAITFYFVYIASKPIL 182 
50 +LNY +P Y F + + + ++ ++PFN+++G++ + F ++I KP + 

Sbjct: 122 SILNYVLILPAYTWFLHSPALSDSALKTAWAGILPFNI^IKGIVITVVFSLIFIKLKPWI 181 



Query: 183 ER 184 
E+ 

55 Sbjct: 182 EQ 183 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 110/185 (59%) , Positives = 144/185 (77%) 



60 Query: 29 MTNTRKWTIAILSALSFvLMMVSFPLIPGAEFLKVDFSILPMLVAFILFDLKSSYGVLL 88 

M+ T KM+ I ILSA+SF+LM+VSF +IK3A FLK++FSI+P+L ++ DLKS+Y +LL 
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Sbjct: 1 MSKTHmiMIGILSAISFLLMLVSFAIIPGAAFLKIEFSIIPVLFGLMIMDLKSAYLILL 60 

Query: 89 LRSLLKVILAmGPETFIGLPMNTWMALFIASFAIFWKNRESAKDFIKASLFGTVSLTV 148 

LRSLLK+ L NRG FIGLPMN++A+ALF+ +FA+ W +++ ++ ASL GT LT 
Sbjct: 61 LRSLLKLFLMTOGVM^FIGLP^IIAIALFVTAFALVWNRQKTLSQYVFASLLGTGLLTF 120 

Query: 149 SMVALNYVFAIPLYAIFANFDIRTFIGVGNYLLT^IPFNIVEGILISIVFYLTYVACLP 208 

MV LNY FAI PLYAI FAN DIR +IGV Y++TMVIPFN+VEG++ +1 FY Y+A P 
Sbjct: 121 GMVVLNYTFAIPLYAIFANIDIRAY1GVTKYMMTMVIPFNLVEGLIFAITFYFVYIASICP 180 

Query: 209 ILERY 213 

ILERY 
Sbjct: 181 ILERY 185 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1199 

A DNA sequence (GBSxl275) was identified in S.agalactiae <SEQ ID 3735> which encodes the amino 
acid sequence <SEQ ID 3736>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-11.04 Transmembrane 278 - 294 ( 270 - 298) 



Final Results 

bacterial membrane --- Certainty=0 . 5416 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3736 (GBS150) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 23 (lane 7; MW 29.7kDa) and in Figure 175 (lane 4 & 5; MW 30kDa). 

Purified GBS150-His is shown in Figure 1 10A, Figure 199 (lane 5) and Figure 227 (lanes 6-7). 

The purified GBS150-His fusion product was used to immunise mice (lane 1+2 product; 20ug/mouse). The 
resulting antiserum was used for Western blot (Figure HOB), FACS (Figure HOC ), and in the in vivo 
passive protection assay (Table III). These tests confirm that the protein is immunoaccessible on GBS 
bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1200 

A DNA sequence (GBSxl276) was identified in S.agalactiae <SEQ ID 3737> which encodes the amino 
acid sequence <SEQ ID 373 8>. This protein is predicted to be a fimbria-associated protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 40 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-15.34 Transmembrane 264 - 280 ( 257 - 285) 
INTEGRAL Likelihood = -7.64 Transmembrane 23 - 39 ( 12 - 41) 

Final Results 
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bacterial membrane Certainty=0. 7135 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC13546 GB:AF019629 putative f imbria-associated protein 
[Actinomyces naeslundii] 
Identities = 95/271 (35%) , Positives = 139/271 (51%) , Gaps = 16/271 (5%) 

Query: 29 VGLLITSYPFISNWYYNIKANNQVTNFDNC/TQKWKEIimRFEtAKAYNRTLDPSRLSD 88 

+GLL +YP ++W 4- ++ Q + + E A AYN L + + 

Sbjct: 1 MGLL- -TYPTAASWVSQYNQSKVTADYSAQVDGARP-DAKTQVEQAHAYNDALSAGAVLE 57 

Query: 89 PYTE KEKKGIAEYAHMLEIAE- -MIGYIDIPSIKQKLPIYAGTTSSVLEKGAGH 140 

K +YA++L+ ++ + IPSI LP+Y GT L KG GH 

Sbjct: 58 ANNHVPTGAGSSKDSSLQYANILKANNEG1MARLKIPSISLDLPVYHGTADDTLLKGLGH 117 

Query: 141 LEGTSLPIGGKSSHTVITAHRGLPKAKLFTDLDKLKKGKIFYIHNIKEVLAYKVDQISW 200 

LEGTSLP+GG+ + 4VIT HRGL +A +FT+LDK+K G + EVL Y+V W 
Sbjct: 118 LEGTSLPVGGEGTRSVITGHRGLAEATMFTNLDKVKTGDSLIVEVFGEVLTYRVTSTKVV 177 

Query: 201 KPDNFSKLLWKGKDYATLLTCrPYSINSHRr,LVSGHRIKYi/PPVKEKNYLMKELQTHYK 260 

+P+ L V +GKD TL+TCTP IN+HR+L+ G RI Y P K+ K + 

Sbjct: 178 EPEETEALRVEEGKDLLTLVTCTPLGINTHRILLTGERI-YPTPAKDLAARGKRPDVPHF 236 

Query: 261 LYFLLS I LVIL I LVALLL YLKRKFKER 287 

++ + + Ll+V L L Y + KER 
Sbjct: 237 PWWAVGLAAGLIWGIiYLWRSGYAAARAKER 267 

A related DNA sequence was identified in S. pyogenes <SEQ ID 373 9> which encodes the amino acid 
sequence <SEQ ID 3740>. Analysis of this protein sequence reveals the following: 
Possible site: 49 

ninal signal sequence 

imembrane 225 - 241 ( 220 - 248) 

Final Results 

bacterial membrane Certainty=0 . 6604 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC13546 GB:AF019629 putative f imbria-associated protein 
[Actinomyces naeslundii] 
Identities = 94/250 (37%) , Positives = 133/250 (52%) , Gaps = 17/250 (6%) 

VECYRDRQLLSTYHKQVTQKKPSEMEEVWQKAKAYNARLGIQPVPDAF- - SFRD 52 

V Y ++ + Y QV +P +V ++A AYN h V +A S +D 

VSQVNQSKVTADYSAQVDGARPDAKTQV-EQAHAYNDALSAGAVLEANNHVPTGAGSSKD 71 







Sbjct: 


13 




53 


Sb j Ct : 


72 


Query: 


113 


Sbjct: 


130 




173 


Sbjct: 


190 




233 



GKD TLVTCTP G+NT R+L+ G RI 
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Sbjct: 244 AAGLIWGLY 253 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 93/192 (48%) , Positives = 130/192 (67%) , Gaps = 2/192 (1%) 

5 

Query: 52 VraFDNQTQKLNTKEINRRFELAKAYNRTLDPSRLSDPYTEKEKKGIAEYAHMLEIA--E 109 

++ + Q + E+ ++ AKAYN L + D ++ ++ Y +L+I + 

Sbjct: 10 LSTYHKQVTQKKPSEMEEVWQKAKAYNARLGIQPVPDAFSFRDGIHDKNYESLLQ1ENND 69 

10 Query: 110 MIGYIDIPSIKQKLPIYAGTTSSVLEKGAGHLEGT3LFIGGKSSHTVITAHRGLPKAKLF 169 

++GY+++PSIK LPIY TT VL KGAGHL G++LP+GG +HTVI+AHRGLP A++F 
Sbjct: 70 IMGYVEVPSIKVTLPIYHYTTDEVLTKGAGHLFGSALPVGGDGTHTVISAHRGLPSAEMF 129 

Query: 170 TDLDKLKKGKIFYIHNIKEVIAYKVDQ1SVVKPDNFSICLLVVKGKDYATLLTCTPYSINS 229 
15 T+L+ +KKG FY + +VLAYKVDQI V+PD + L V GKDYATL+TCTPY +N+ 

Sbjct: 130 TNIjNLVIOCGDTFYFRvIJtfKvIAYKVDQILT^ 189 

Query: 230 HRLLVRGHRIKY 241 
RLLVRGHRI Y 
20 Sbjct: 190 KRLLVRGHRIAY 201 

SEQ ID 3738 (GBS210) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 50 (lane 3; MW 61kDa). 

GBS210d was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
25 shown in Figure 152 (lane 2-4; MW 54kDa) and in Figure 187 (lane 9; MW 54kDa). It was also expressed 
in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 154 (lane 2-4; 
MW 28.7kDa) and in Figure 182 (lane 13; MW 29kDa). Purified GBS210d-GST is shown in lane 4 of 
Figure 237. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 1201 

A DNA sequence (GBSxl277) was identified in S.agalactiae <SEQ ID 3741> which encodes the amino 

acid sequence <SEQ ID 3742>. This protein is predicted to be a fimbria-associated protein. Analysis of this 

protein sequence reveals the following: 

35 Possible site: 42 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.61 Transmembrane 20 - 36 ( 15 - 40) 
INTEGRAL Likelihood = -7.27 Transmembrane 259 - 275 ( 258 - 277) 

40 Final Results 

bacterial membrane — Certainty=0 . 5246 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC13546 GB:AF019629 putative f imbria-associated protein 
[Actinomyces naeslundii] 
Identities = 76/219 (34%) , Positives = 120/219 (54%) , Gaps = 12/219 (5%) 

50 Query: 28 LSILLYPWSRFYYTIESNNQTQDFERAAKKLSQKEINRRMALAQAYNDSLN N 80 

+ +L YP + + + T D+ A ++ + ++ A AYND+L+ N 

Sbjct: 1 MGLLTYPTAASWSQYNQSKVTADYS-AQVDGARPDAKTQVEQAHAYNDALSAGAVLEAN 59 



Query: 81 vHLEDPYEKKRIQKGVAEYARMLEVSEK- - IGTISVPKIGQKLPIFAGSSQEVLSKGAGH 138 
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Query: 139 LEGTSLPIGGNSTHTVITAHSGIPDKELFSNLKKLKKSDKFYIQNIKETIAYQVDQIKW 198 

LEGTSLP+GG T +VIT H G+ + +F+NL K+K GD ++ E + Y+V KW 
Sbjct: 118 LEGTSLPVGGEGTRSVITGHRGLAEATMFTIILDK^/KTGDSLIVEVFGEVLTYRVTSTKW 177 

Query: 199 TPDNFSDLLWPGHDYATLLTCTPIMINTHRLLVRGHRI 237 

P+ L V G D TL+TCTP+ INTHR+L+ G RI 
Sbjct: 173 EPEETEALRVEEGKDLLTLVTCTPLGINTHRI1.LTGERI 216 

There is also homology to SEQ ID 3740. 

A related GBS gene <SEQ ID 8749> and protein <SEQ ID 8750> were also identified. Analysis of tl 

protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 10 
McG: Discrim Score: 9.66 
GvH: Signal Score (-7.5): -6.53 

Possible site: 42 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 2 value: -10.61 threshold: 0.0 

INTEGRAL Likelihood =-10.61 

INTEGRAL Likelihood = -7.27 

PERIPHERAL Likelihood = 5.14 216 
modified ALOM score: 2.62 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 5246 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

33.4/53.0% over 277aa 

Actinomyces 

naeslundii 

GP | 3036999 | putative f imbria-associated protein Insert characterized 
ORF00563 (382 - 1179 of 1479) 

GP|3036999|gb|AAC13546.l| |AF019629(1 - 278 of 365) putative f imbria-associated protein 
{Actinomyces naeslundii} 
%Match =13.4 

%Identity =33.3 %Similarity =53.0 

Matches = 90 Mismatches = 118 Conservative Sub.s = 53 

180 210 240 270 300 330 360 390 

WIMKRRQSKEA*G*SLMMYKRS*SCAYDLRVFQ*KYS*IISKSHYLGDDV'KTKKIIKKTKKKKKSNLPFIILFLIGLSI 



LLYPWSRFYYTIESNNOTQDFERAAKKLSQKEINRRMALAQAYNDSLN NVHLEDPYEKKRIQKGVAEYARML 

lll=: = I I: I := : :: |:||||:|: I h I : :|| :| 

LTYPTAASWVSQYNQSKVTADYS - AQVDGARPDAKTQVEQAHAYNDALSAGAVLEANNHV- - PTGAGSSKDSSLQYANIL 



EVS--EKIGTISVPKIGQKLPIFAGSSQEVLSKGAGHLEGTSLPIGGNSTHTVITAHSGIPDKELFSNLKKLKKGDKFYI 
= = : : :| I II- h= = I II MIHIUMI I =111 I h = :|:|| hi || : : 

KANNEGLMARLKI PS I SLDLPVYHGTADDTLLKGLGHLEGTSLPVGCEGTRSVITGHRGLAEATMFTNLDKVTCTGDSLI V 



873 903 933 963 993 1023 1053 1083 

QNIKETIAYQVDQIKWTPDNFSDLLWPGHDYATLLTCTPIMIOTHRLLTOGHRIPYKGPIDEKLIKDGHLNTIYRYLF 



WO 02/34771 



PCT/GB01/04789 



1098 1179 1209 1239 1269 1299 

Y ISLVIIAWLLWL--1KRQRQKTO-IASWKGIES*WEENFRKTLR1JRSP*IDG*M*A*YYCS*LVF**PHILLF 

l = - II I I I II I I : = :| I - I I : 



250 260 270 280 290 300 310 

SEQ ID 8750 (GBS212) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 4; MW 36kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 50 (lane 2; MW 61kDa). 

Purified Thio-GBS212-His is shown in Figure 244, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1202 

A DNA sequence (GBSxl278) was identified in S.agalactiae <SEQ ID 3743> which encodes the amino 
acid sequence <SEQ ID 3744>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.40 Transmembrane 680 - 696 ( 674 - 699) 

Final Results 

bacterial membrane — Certainty=0 . 5161 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA57459 GB:X81869 orf2 [Lactobacillus leichmannii] 
Identities = 84/325 (25%) , Positives = 122/325 (36%) , Gaps = 94/325 (28%) 

Query: 397 vNWYTLKDKD KTVASVSLTKTSKGTI DLGNGIKFEVSGNF 437 

VNV + +KDKD TV+ LTK++ T+ D G + F+ + 

Sbjct: 236 VNVPWNIKDKDTFJWVDKPDTGIDIDASTVSIDGLTKSTDYTVNKKDNGYQWFKTT--- 292 

Query: 438 SGKFTCLENKSYMISERVSGYGSAINLENGKVTITNTKDSDNPTPLNPTEPKVETHGKKF 497 

S L KS 1+ K T+TN D + T +G 

Sbjct: 293 SAA.VQAIAGKSLTITY KATLTNNATPDKA--IGNTATLSIGNGTNI 336 

Query: 498 VKTNEQGDRL- -AGAQFWKNSAGICYLALKADQSEGQKTIiAAKKIALDEAIAAYNKLSAT 555 

T G R+ GAQFV K+S + KTLA + L + + N +S 

Sbjct: 337 TSTPANGPRIYTGGAQFVKKDS QSNKTLAGAEFQLVKVDSNGNIVSYA 384 

Query: 556 DQKGEKGITAKELIKTKQADYDAAFIEARTAYEWITDKARAITYTSNDQGQFEVTGLADG 615 

Q + +Y W A TYTS+ G + GL+ 

Sbjct: 385 TQASDG SYTWNDSATEATTYTSDANGLVALKGLSYS 420 

Query: 616 TYNLEETLAPAGFAKLAGNIKFVVNQGSYITGGNIDYVANSNQKDATRVENKK 668 

+Y L E AP G+AKL +KF + QGS+ G+ + + N K+ 
Sbjct: 421 DKLDSGESYALLEIQAPDGYAKLDSPVKFSITQGSF- - 

Query: 669 VTIPQTGGIGTILFTIIGLSIMLGA 693 

+P TGG G +F IG+ IM+ A 
Sbjct: 471 -LLPSTGGKGIYIFLAIGIVIMIVA 494 



No corresponding DNA sequence was identified in S.pyogenes. 
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SEQ ID 3744 (GBS59) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 8; MW 120kDa), in Figure 11 (lane 9; MW lOOkDa) and in Figure 13 
(lane 6; MW 74kDa). 

GBS59-His was purified as shown in Figure 193, lane 2. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1203 

A DNA sequence (GBSxl279) was identified in S.agalactiae <SEQ ID 3745> which encodes the amino 
acid sequence <SEQ ID 3746>. Analysis of this protein sequence reveals the following: 

10 Possible site: 25 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.13 Transmembrane 870 - 886 ( 864 - 887) 

Final Results 

15 bacterial membrane Certainty=0. 2253 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:AAD33086 GB:AF071083 f ibronectin-binding protein I [Streptococcus pyogenes] 

Identities = 58/176 (32%) , Positives = 83/176 (46%) , Gaps = 19/175 (10%) 
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Sbjct: 


8 


KLSFLLSLTGFILGLLLVFIGLSGVSVGHAETRNGANKQGSFEIKKVDONNKPLPGATSS 
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Sbjct: 


68 


LTSKDGKGTSVQTFTSNDKGIVDAQNLQPGTYTLKEETAPDGYDKTSRTWTVTVYENGYT 
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121 TIQNSGDKNST I GQNQEELDKQYPPTG I YEDTKESYKLEHVKGSVPN - - GKSEAKA 174 








+ + I + +D S +LE+ K SV + GK+E + 




Sbj Ct : 


128 


KLVENPYNGEI I SKAGS KDVSSSLQLENPKMSWSKYGKTEVSS 171 





Identities = 31/92 (33%) , Positives = 49/92 (52%) , Gaps = 14/92 (15%) 
35 

Query: 725 PTITIKNEKKLGEIEFIKVDKDIMKLDLKGATFELQEFNEDYKLYLPIKMMSKWrGEW 784 

P+IT+ N K++ ++ F K+ DN + L A FEL+ N N+ K+ N 

Sbjct: 501 PSITVANLKRVAQLRFKKMSTDN- -VPLPEAAFELRSSN GNSQKLEASSN 548 

40 Query: 785 - -GKISYKDLKDGKYQLIEAVSPEDYQKITNK 814 

G++ +KDL GYLE +P+ YQ++T K 
Sbjct: 549 TQGEVHFKDLTSGTYDLYETKAPKGYQQVTEK 580 

No corresponding DNA sequence was identified in S.pyogenes. 

45 SEQ ID 3746 (GBS67) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 10; MW 140kDa), in Figure 11 (lane 10; MW 150kDa) and in Figure 12 
(lane 6; MW 95.3kDa). 

GBS67-His was purified as shown in Figure 192, lane 10. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 
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Example 1204 

A DNA sequence (GBSxl280) was identified in S.agalactiae <SEQ ID 3747> which encodes the amino 
acid sequence <SEQ ID 3748>. This protein is predicted to be Nra. Analysis of this protein sequence 
reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2020 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9979> which encodes amino acid sequence <SEQ ID 9980> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3749> which encodes the amino acid 
sequence <SEQ ID 3750>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

:ermin; 

L.75 Transmembrane 393 - 409 ( 392 - 409) 



Final Results 

bacterial membrane --- Certainty=0. 1702 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 122/325 (37%) , Positives = 18S/325 (56%) , Gaps = 5/325 (1%) 

Query: 7 LIENYLEKDILNQIKLLTLCY--DYYPSITIiDKSCHQLGLSELLIRKYCHDLTTLFNSQL 64 

LIE YLE I +4 +L+ L + Y P + + + GL+ L + YC +L F L 
Sbjct: 1 LIEKYLESSIESKCQLIVLFFKTSYLP ITEVAEKTGLTFLQLNHYCEELNAFFPGSL 57 

Query: 65 SLNIEKSTIVYQSNGVTREQAFKYIYHQSHVLQLLKFLITNDSGRLPLTYFSEKFGLSCA 124 

S+ I+K I Q +E +Y S+VLQLL FLI N S PLT F+ LS + 

Sbjct: 58 SMTIQKRMISCQFTHPFKETYLYQLYASSNVLQLLAFLIKNGSHSRPLTDFARSHFLSNS 117 

Query: 125 TAYRIRKHISPLLEKIGFQIVKNTITGDEYRIRYLIAFLNAQFGIEVYPMSIQNIDKLLIKR 184 

+AYR+R+ + PLL H UN I G+EYRIRYLIA L ++FGI+VY +++ DK I 

Sbjct: 118 SAYRMREALIPLLRNFELKLSKNKIVGEEYRIRYLIALLYSKFGIKVYDLTQQDKNTIHS 177 

Query: 185 LLLEHSTTFTASHYFPNTFIFFDTLLSLSWKRINYKVWPYSSLFTELQNIFIYDTLQYC 244 

L ST S + +F F+D LL+LSWKR ++V +P + +F 4-L+ +F+YD+L+ 
Sbjct: 178 FLSHSSTHLKTSPWLSESFSFYDILLALSWKRHQFSVTIPQTRIFQQLKKLFVYDSLKKS 237 

Query: 245 VKWIIDSFKINLKKDDIDYIFLAYLTSHNSFSNPKKTEKRIDNVIAIFENYPKFQKLLQ 304 

++I ++N D+DY++L Y+T++NSF++ WT + I +FE F+ LL 

Sbjct: 238 SHDIIETYCQLNFSAGDLDYLYLIYITANNSFASLQWTPEHIRQYCQLFEENDTFRLLLN 297 

Query: 305 PLKDALPLSGSYHDELVKVAI FFSE 329 

P+ LP LVK +FFS+ 

Sbjct: 298 PIITLLPNLKEQKASLVKALMFFSK 322 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1205 

A DNA sequence (GBSxl281) was identified in S.agalactiae <SEQ ID 3751> which encodes the amino 
acid sequence <SEQ ID 3752>. This protein is predicted to be galactosyltransferase. Analysis of this protein 
sequence reveals the following: 

d N-terminal signal sequence (or aa 1-22) 



Final Results 

bacterial cytoplasm Certainty=0. 116 8 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB99071 GB:U67549 galactosyltransferase isolog [Methanococcus 
j annas chi i] 

Identities = 108/395 (27%), Positives = 196/395 (49%), Gaps = 28/395 (7%) 

Query: 4 KVKTVAVFSGyYLPFLGGIERYTDKMTADLVK-RGYRWIVTTNHGDLPIIDEDKGR--- 59 

K+K + +F GYY+P +GG+E + D+TL+ .Y + I N +P E + R 
Sbjct: 3 K1KLI-IFPGYYIPHIGGLETHVDEFTKHLSEDENYDIYIFAPN IPKYKEFEIRHNN 58 

Query: 60 -KIYRLPTKNIVKQRYPIINK-NREYNTLMKYVSDENIDFVICNTRFQLTTLEGLSFAKN 117 

K+YR P 1+ YP+ N N ++ + + + D V+ TRF TL G FAK 
Sbjct: 59 VKVYRYPAFEIIPN-YPVPNIFNIKFWRMFFNLYKIDFDIVMTRTRFFSNTLLGFIFAKL 117 

Query: 118 HHLPS- - IvIJOTGSSHFSVNNRFIiDFFGAIYEHLIjTARVKHYRPDFYAVSKRSVEWIiKHF 175 

I ++HGS+ + + F + Y+ + + A+SK ++ 

Sbjct: 118 RFKKKKLIHVEHGSAFVrajESEFKNKLSYFYDKTIGKLIFKKMDYVvAISKAVKNFILEN 177 

Query: 176 N1EAKGV- - IYNSVS ESLGSDFAGTAYLEKSADDIFITYAGRIIKEKGIELLLEAF 229 

+ K + IY + ES+G D EK + I + + GR+ K KG+E +++A+ 

Sbjct: 178 FVNDKDIPIIYRGLEIEKIESIGED---KKIKEKFKNKIKLCFVGRLYKWKG\'ENIIKAY 234 

Query: 230 S - -MSQYSENVYLQIAGDGPELAHLKE KYQSKQINFLGKLNFEQTMSLMAQTDIFVY 284 

E + L + GG+L LK+ Y+ IF GK++FE+ ++++ +DI+++ 
Sbjct: 235 VDLPKDLKEKIILIWGYGEDLERLKKIAGNYI^NGIYFTGKVDFEKAIAIVKASDIYIH 294 

Query: 285 PSMYPEGLPTS ILEAGLLSSAI IATDRGGTVEVIDSPELGI IMEENT - QSLHESLDLLVK 343 

S GL +S+L+A AI+A+ G EV+ GI++++N+ + + + L++ 

Sbjct: 295 SSYKGGGLSSSLLQAMCCGKAIVASPYEGADEWIDGYNGILLKDNSPEEIKRGIIKLIE 354 

Query: 344 DKALREKLQQNIAKRIKEHFTWEKTVEKLDYIIQK 378 

+ LR+ +N IKE+F W+K+V++ I ++ 
Sbjct: 355 NMNLRKI YGENAKNF I KENFNWKKSVKE YKKI FER 389 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3752 (GBS258) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 45 (lane 2; MW 43kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 48 (lane 7; MW 67.9kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1206 

A DNA sequence (GBSxl282) was identified in S.agalactiae <SEQ ID 3753> which encodes the amino 
acid sequence <SEQ ID 3754>. Analysis of this protein sequence reveals the following: 
Possible site: 31 
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d N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1182 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MKYLAGIVTFNPNIERLDQNIRAIYPQVSHIYIVDNGSKNKEEISQLVADYlffiEGHLTVD 60 

M AGIV FNP+I+RL +NI A+ Q +H+Y+VDNGS N +E+ L+ YN+ +++ 
Sbjct: 1 MDISAGIVLFNPDIKRLKENIDAVIIQCTHLYLVDNGSGNVDEVKGLLNQYNQS-KISIIj 59 

Query: 61 YLTENKGIAYALNCIGQFAVAQEFDWFLTLDQDSWLGDLIDNYENYLHLPKVGMLSCLY 120 

+ EN+GIA ALN + A + FDW LTLDQDSW +++ +E Y++" VG+L + 
Sbjct: 60 WNRENQGIAKALNQLTSAAQKEGFDWILTLDQDSWPSNIVGEFEICYINNSSVGILCPII 119 

Query: 121 QDMNRENLVMQEFDYICEIEECITSAALMKTSVFE3TSGFAEEMFIDFVDSEMNYRLSEMG 180 

D N++ + D EI+ECITS +L+ + E GF E MFID VD ++ YRL + G 
Sbjct: 120 CDRNKDEEIKINEDCTEIDECITSGSLLNIKAWSEIGGFDERMFIDGVDFDICYRLRQRG 179 

Query: 181 YKTYQTOFIGLLHEIGHSSRVKKFGHVFHVIiNHSPFRKYYMIRNAIYIIKKYGKKKRYKY 240 

YK Y ++ + LLHE+GH + V NHS FRKYY+ RN IY KK 

Sbjct: 180 YKIYCIHSWLLHELGHIEYHRFLFWKVLVKNHSAFRKYYIARNIIYTAKKRRSTLLWK 239 

Query: 241 LVFMRNEFVRVLV-AEEQKSKKIVAMIKGLKDGLLMKV 277 

+ + + +++ EE K KI + +G+ DG KV 
Sbjct: 240 GLLQE1 KLIGIVI FYEEDKLNKIRCI CRGI YDGFKGKV 277 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful 



35 Example 1207 

A DNA sequence (GBSxl283) was identified in S.agalactiae <SEQ ID 3755> which encodes the amino 
acid sequence <SEQ ID 3756>. This protein is predicted to be EpsU protein (rfbX). Analysis of this protein 
sequence reveals the following: 



Jssible site: 54 
>> Seems to have an uncleavable N-term signal seg 
INTEGRAL Likelihood = 
INTEGRAL Likelihood : 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
Likelihood = 
Likelihood . 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood . 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



357 - 373 



246 - 252 



Transmembrane 144 - 160 
317 - 333 



245 - 263i 
290 - 312; 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 4376 (Affirmative! 

- Certainty=0. 0000 (Not Clear) ■ 

- Certalnty=0. 0000 (Not Clear) • 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB52225 GB:Z98171 EpsU protein [Streptococcus thermophilus] 
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= 189/462 (40%) , Positives = 313/462 (66%) 



Sbjct: 


1 

1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbj ct: 


181 


Query: 


241 


Sbjct: 


241 


Query- 


301 


Sbj ct : 


301 




361 


Sbj ct : 


361 




421 


Sbjct: 





MKLLKNMFYNTSYQLLTLLLPLVTVPYVSR\'LSPQC3IGINAyTSSIVMYFTLFGflLGISL 60 
M+++KN YN YQ+ +++PL+T+PY+SR+L p GIGIN+YT+SIV YF LFG+4G+ L 
MQIVKNYLYWAIYQVFIIIVPLLTIPYLSRILGPSGIGINSYTNSIVQYFVLFGSIGLGL 60 



A A DISW F+G+E+FK+IV+RN IVKL+ + FL VK+ +DL +Y+ 4 



DQSDKI++++ IV+A G V LPR+++ F+ + + K + +AIS+ M+ G++ 



V P++ +ES+ II I++ NA+G QYLL + + K+YT+S +IG 



- N++LNI LI LGA+GA I+TVI+I 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1208 

A DNA sequence (GBSxl284) was identified in S.agalactiae <SEQ ID 3757> which encodes the amino 
acid sequence <SEQ ID 3758>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1742 (Affirmative) < suco 

bacterial membrane Certainty^ 0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1209 

A DNA sequence (GBSxl285) was identified in S.agalactiae <SEQ ID 3759> which encodes the amino 
acid sequence <SEQ ID 3760>. Analysis of this protein sequence reveals the following: 
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Possible site: 25 

»> Seems to have an uncleavable N-term signal seq 



Final Results 

5 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

10 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1210 

A DNA sequence (GBSxl286) was identified in S.agalactiae <SEQ ID 3761> which encodes the amino 
acid sequence <SEQ ID 3762>. Analysis of this protein sequence reveals the following: 



210 - 236! 
361 - 386 



189 - 209) 



Transmembrane 364 - 



Possible site: 34 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.56 Transmembrane 214 - 230 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 

Likelihood = -1.33 Transmembrane 333 - 349 
Likelihood = 



272 - 288 



191 - 207 



Transmembrane 400 - 



■ Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



■-- Certainty=0.: 
■-- Certainty=0. 
•-- Certainty=0. 



>25 (Affirmative) . 
)00(Not Clear) < : 
100 (Not Clear) < i 



35 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 
vaccines or diagnostics. 



Example 1211 

A DNA sequence (GBSxl287) was identified in S.agalactiae <SEQ ID 3763> which encodes the amino 
acid sequence <SEQ ID 3764>. This protein is predicted to be rhamnosyltransferase. Analysis of this 
protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1792 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9981> which encodes amino acid sequence <SEQ ID 9982> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF18951 GB:AF155805 Cps9H [Streptococcus suis] 
Identities = 53/116 (45%) , Positives = 75/116 (63%) , Gaps = 4/116 (3%) 

Query: 6 VIMATYNGQGFIHDQLDSIRNQTLRPDYA/LI'ISDDSSTDDTVKVVEDV'IKEHRLDGWSITS 65 

VLMATYNG FI QLDSIRNQ++ D V++ DD STDDT+K+++DYIK++ LD W ++ 
Sbjct: 4 VLMATYNGSPFI I KQLDS IRNQSVSADKVI IWDDCSTDDT I KI I KDYIKKYSLDSWWSQ 63 

Query: 66 NDKNLGWRLNFRQLL IDVIAYEVDYVFFSDQDDTWYHHKNKMQVD IMEERQD INLL 121 

N N G F L + VFFSDQDD W HK + +1 +R++++++ 

Sbjct: 64 NKSNQGHYQTFINL TKLVQEGIVFFSDQDDIWDCHKIETMLPIF-DRENVSMV 115 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1212 

A DNA sequence (GBSxl288) was identified in S.agalactiae <SEQ ID 3765> which encodes the amino 
acid sequence <SEQ ID 3766>. This protein is predicted to be rhamnosyltransferase. Analysis of this 
protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1278 (Affirmative) < suco 

bacterial membrane --- Certainty=0.0000(Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9983> which encodes amino acid sequence <SEQ ID 9984> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 10 vXjMATYNGEIFISEQLDSIRQQTLKPDYVIjLRDDCSTDETWr/VNNYIAKHELEGWKIVK 69 

VLMATYNG FI +QLDSIR Q++ D V+-f DDCSTD+T+ ++ +YI K+ L+ W + + 
Sbjct: 4 VLMATYNGSPFIIKQLDSIRNQSVSADKVIIWDDCSTDDTIKIIKDYIKKYSLDSWWSQ 63 

Query: 70 OTKHLGJTOLNFRQLLIDvlAYEvrJYVFFSDQDDITOLDIOTERQFAIMSDKPQIEVLSADV 129 

N N G F L + VFFSDQDDIW K E I D+ + + V 
Sbjct: 64 NKSNQGHYQTFINL TKLVQEGIVFFSDQDDIKDCHKIETMLPIF-DRENVSM V 115 

Query: 130 DIKTMSTEASVPHFLTFSSSDRISQY 155 

K+ + + + +SDRI+ Y 

Sbjct: 116 FCKSRLIDENGNIISSPDTSDRINTY 141 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1213 

A DNA sequence (GBSxl289) was identified in S.agalactiae <SEQ ID 3767> which encodes the amino 
acid sequence <SEQ ID 3768>. This protein is predicted to be dTDP-glucose 4-6-dehydratase (galE). 
Analysis of this protein sequence reveals the following: 
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Possible site: 44 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.02 Transmembrane 250 - 266 ( 250 - 266) 

Final Results 

bacterial membrane Certainty=0. 1808 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9985> which encodes amino acid sequence <SEQ ID 9986> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAC14890 GB:AJ295156 d-TDP-glucose dehydratase [Phragmites 
australis] 

15 Identities = 108/327 (33%) , Positives = 170/327 (51%) , Gaps = 22/327 (6%) 



Query: 29 ANKGVLISGSNSMIiASYMVFLLAyiiNETKNyQTQIIATARNIEKARDKFSDLVGKDYFTL 88 

AM +L++G + S++V L N + ++I ++D +G F L 

Sbjct: 33 ANLRILVTGGAGFIGSHLVDKLM ENEKHEVIVADNFFTGSKDNLKKWIGHPRFEL 87 

Query: 89 I PYDVEERLEYDGKVDYI IHAASNAS PTAILSNPVS 1 1 KANTIGTLNLLDFAKEKTIENF 148 

I+DV + L + VDIHA ASP NPV IK N IGTLN+L AK + 

Sbjct: 88 IRHDVTQPLLVE- -VDQIYHLACPASPIFYKHNPVKTIKTMVIGTLNMLGLAK-RVGARI 144 

Query: 149 LFLSTREVYGTSIKEVIDEEAYGGFDILATRACYPESKRMAETLLQSYYDQYKVPFTIAR 208 

L ST EVYG ++ E 4G + + R+CY E KR+AETL+ Y+ Q+ + IAR 
Sbjct: 145 LLTSTSEWGDPLEHPQTEAYWGNVNPIGVRSCYDEGKRVAETLMFDYHRQHGIEIRIAR 204 



Query: 209 IAHSFGPGMELGNDGRIMNULLSNVIDGKDIVLKSSGTAERAFCYLADAVSGLFTILLNG 258 

I +++GP M + +DGR++++ ++ + G + ++ GT R+FCY+AD V GL L+NG 
Sbjct: 205 IFNTYGPRMNI-DDGRVVSNFIAQAvRGDPLTVQKPGTQTRSFCYVADMVDGLIK-LMNG 262 

Query: 269 FA/GCjAYNVATsFEDQPIMIKDIAQKLvDLFSDKNISVVFDIPKTMSAGYSKMGRTR LTM 325 

N+ N + M4- +LA+K+ +L + ++ TM+ R R +T 

Sbjct: 263 NNTGPINLGNPGEFTML-ELAEKVKELINP EVTVTMTENTPDDPRQRKPDITK 314 

Query: 326 AKLEALGWKREVSLESGILKTVQAFEE 352 

AK E LGW+ +V L G++ F E 

Sbjct: 315 AK-EVLGWEPKWLRDGLVLMEDDFRE 340 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1214 

A DNA sequence (GBSxl290) was identified in S.agalactiae <SEQ ID 3769> which encodes the amino 
acid sequence <SEQ ID 3770>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have an uncleavable N-term signal seq 



Final Results 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9987> which encodes amino acid sequence <SEQ ID 9988> 
55 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
>GP:CAB11866 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 
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Identities = 77/231 (33%) , Positives = 131/231 (56%) , Gaps = 6/231 (2%) 

Query: 13 VIFAGGVGRR^KGKPKQFLEVHGKPIIVHTIDIFQNTEAIDAVWVCVSDWLDYMNNL 72 

VI A G G+RM G+ K F+E+ G P+1+HT+ +F + D +++V ++ L 

Sbjct: 6 VIPAAGQGKRMKa-GRNKLFIELKGDPVIIHTLRVFDSHRQCDKIILVINEQEREHFQQL 64 

Query: 73 VERFNLTKVKAWAGGETGQMSIFKGLEAAEQIATDDAWLIHDGVRPLINEEVINANIQ 132 

+ + +VAGG+ Q S++KGL+A +Q + +VL+HDG RP I E 1+ I 
Sbjct: 65 LSDYPFQTSIELVAGGDERQHSVYKGLKAVKQ EKIVLVHDGARPFIKHEQIDELIA 120 

Query: 133 SVKETGSAVTSVRAKETVVLVNDSSKISEVVDRTRSFIAKAPQSFYLSDILSVERDAISK 192 

++TG+A+ +V K+T+ V D ++SE ++R+ + + PQ+F LS ++ +A K 
Sbjct: 121 EAEQTGAAILAVPVKDTIKRVQDL-QVSETIERSSLWAVQTPQAFRLSLLMKAHAEAERK 179 

Query: 193 GITDAIDSSTLMGMTORELTIVEGPYENIKITTPDDFYMFKALYDARENEQ 243 

G D+S + M + +VEG Y NIK+TTPDD +A+ ++ + 

Sbjct: 180 GFLGTDDASLVEQMEGGSVRWEGSYTNIKLTTPDDLTSAEAIMESESGNK 230 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3770 (GBS647) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 130 (lane 9 & 10; MW 55.9kDa + lane 8; MW 27kDa) and in Figure 186 (lane 5; 
MW 56kDa).. It was also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 130 (lane 12; MW 31kDa), in in Figure 140 (lane 9; MW 31kDa) and in Figure 
178(lane6;MW31kDa). 

Purified GBS647-GST is shown in Figure 243, lane 4; purified GBS647-His is shown in Fig.229, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1215 

A DNA sequence (GBSxl291) was identified in S.agalactiae <SEQ ID 3771> which encodes the amino 
acid sequence <SEQ ID 3772>. This protein is predicted to be LicDl. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2647 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9989> which encodes amino acid sequence <SEQ ID 9990> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD37094 GB:AF106539 LicD2 [Streptococcus pneumoniae] 
Identities = 85/271 (31%) , Positives = 130/271 (47%) , Gaps = 15/271 (5%) 

Query: 1 MKEMWSEIREVQLEMIAYIDKVARDNKIEYSLGGGSLLGAMRHKGFIPWDDDIDLMLER 60 

M+ + EI+E+QL +L YID+ + + I Y L G++LGA+RHKG IPWDDDID+ L R 
Sbjct: 1 MQYLEKKEIKEIQIALLDYIDETCKKHDIPYFLSYGTMLGAIRHKGMIPWDDDIDISLYR 60 

Query: 61 SQYERLMKALADANNSDFKLLHHSVEKNLW PFAKLYHTKSMYLSKTDRIHPWTGIFI 117 

YERL+K + + N+ +K+L S + + W FA + T ++ T +FI 

Sbjct: 61 EDYERLLKI IEEENHPRYKVL - - S YDTSSWYFHNFAS ILDTSTVIEDHVKYKRHDTSLFI 118 
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Query: 118 DIFPLDRLPESAEERQRFFKKVHSAAANLMCTTYPNFASGSRKLYANARLILGLP-RFIA 176 

D+FP+DR + +++ +AL G KL RL RF+ 

Sbjct: 119 DVFPIDRFTDLSIVDKSY---KWALRQLA.YIKKSRAVHGDSKLKDFLRLCSWYALRFVN 175 

5 Query: 177 YHGQAKKRAEIVDQVMETYNNQEVPYMGYTD-SRYRLKEYFPREIFSEYEDVMFENIKTR 235 
KK +DQ+++ Y G + +KE FP + F E FE 

Sbjct: 176 PRYFYKK IDQLVKNAVTNTPQYEGGVGIGKEGMKEIFPVDTFKELILTEFEGRMLP 231 

Query: 236 KI KNEHAYLNQLYGGSYMELPPES KRESHSY 266 
10 K +L Q+Y G YM P + +E +S+ 

Sbjct: 232 VPKKYDQFLTQMY-GDYMTPPSKEMQEWYSH 261 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 1216 

A DNA sequence (GBSxl292) was identified in S.agalactiae <SEQ ID 3773> which encodes the amino 
acid sequence <SEQ ID 3774>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
20 »> May be a lipoprotein 

INTEGRAL Likelihood =-12.05 Transmembrane 554 - 570 ( 547 - 575) 

Final Results 

bacterial membrane — Certainty=0 . 5819 (Affirmative) < suco 
25 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

30 SEQ ID 3774 (GBS 1 82d) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 184 (lane 8; MW 62kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1217 

35 A DNA sequence (GBSxl293) was identified in S.agalactiae <SEQ ID 3775> which encodes the amino 
acid sequence <SEQ ID 3776>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 4653 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1218 

A DNA sequence (GBSxl294) was identified in S.agalactiae <SEQ ID 3777> which encodes the amino 
acid sequence <SEQ ID 3778>. This protein is predicted to be DOLICHYL-PHOSPHATE MANNOSE 
SYNTHASE RELATED PROTEIN. Analysis of this protein sequence reveals the following: 

5 Possible site: 29 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.92 Transmembrane 232 - 248 { 231 - 248) 



Final Results 

10 bacterial membrane Certainty=0. 2 168 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9991> which encodes amino acid sequence <SEQ ID 9992> 
15 was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC35924 GB:AF071085 putative glycosyl transferase [Enterococcus 
faecalis] 

Identities = 118/240 (49%), Positives = 152/240 (63%), Gaps = 1/240 (0%) 

20 





14 


KILLVIPATOEEGSIAKTVQTIVDFKASRS-LPFELDYIVIMDGSTDGTPELLDRLGLNH 


72 






K+LL+IPAYNEE +1 +T+ +1 FK + ELDY+VINDGSTDGT ++L+ +N 




Sbjct: 


2 


KVLLIIPAYNEEENILRTIASIETFKQEVTHFQHELDYWINDGSTDGTKQILEVNQINA 


61 




73 


IDLVQNLGIGGCTQTGYLYANRNHYDVaVQFDGDGQHDIRSIEDVVMPILNDEADFVIGS 








I LV NLGIGG VQTGY YA N YDVA QFDGDG HDI S+ ++ P+ F GS 




Sbjct: 


62 


IHLVLNLGIGGAVQTGYKYALENEYDVAXQFDGDGXHDIXSLPILLEPLAEGXCXFSXGS 


121 


Query: 


133 


RFVDKKHQNFQSTAMRRLGINLISAAIKLTTGHKVYDTTSGYRAANAALIAYIiSCHYPVQ 


192 






RF+ +FQS MRR GI L+S G +Y T G RA N +IA+ + YP 




Sbjct: 


122 


RFIPGNXASFQSXKMRRXGIRLLSFCXXXAXGXTIYXVTXGXRAGNRKVIAFFAKRYPTN 


181 


Query: 


193 


YPEPESTARILKKGYRLKEVTANMFEREAGTSSISSLKSI FYMTDVLTS 1 1 IAGFIKEDD 


252 






YPEPES ++KK + + E NM ER G SSI +L S+ YM +V ++I + IA F+KE D 




Sbjct: 


182 


YPEPESIVHLIKKKFVIWRPVNMMERLGGVSSIRALASVKYMLEVGSAILIAPFMraGD 


241 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3779> which encodes the amino acid 
sequence <SEQ ID 3780>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
40 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.80 Transmembrane 211 - 227 ( 211 - 227) 



Final Results 

bacterial membrane 
45 bacterial outside 

bacterial cytoplasm 



•-- Certainty=0. 1319 (Affirmative) < suco 
■-- Certainty=0. 0000 (Not Clear) < suco 
— Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAC35924 GB:AF071085 putative glycosyl transferase [Enterococcus 
50 faecalis] 

Identities = 104/233 (44%) , Positives = 134/233 (56%) , Gaps = 9/233 (3%) 

Query: 1 VKKLI I IPAYNESSNIVNTIRTIESDAPD FDYIIIDDCSTDNTLAICQKQGFN 53 

+K L+ 1 1 PAYNE NI+ TI +IE+ + DY++I+D STD T I + N 

55 Sbjct: 1 MKVLLIIPAYNEEENILRTIASIETFKQEVTHFQHELDYWINDGSTDGTKQILEVNQIN 60 

Query: 54 VISLPINLGIGGiAVQTGYRYAQRCGYDVAVQVDGDGQHNPCYLEKIWEVLVQSSVNMVIG 113 

I L +NLGIGGAVQTGY+YA YDVA Q DGDG H+ L ++E L + G 
Sbjct: 61 AIHLVLNLGIGGAVQTGYKYALENEYDVAXQFDGDGXHDIXSLPILLEPLAEGXCXFSXG 120 
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Query: 114 SRFI - -TKEGFQSSFARRIGIKYFTKLIALLTGKKITDATSGLRLIDRSLIERFANHYPD 171 

SRFI FQS RR GI+ ++ G I T G R +R +1 FA YP 

Sbjct: 121 SRFIPGNXASFQSXKMRRXGIRIiSFOCXXMGXTIYXVTXGXRAGNRKVIAFFAKRYPT 180 

Query: 172 DYPEPETWDVLVSHFKVKEIPVVMNERQGGVSSISLTKSVYYMIKVTLAILV 224 

+YPEPE++V ++■ F + E PV M ER GGVSSI SV YM++V AIL+ 
Sbjct: 181 OTPEPESIVHLIKKRFVITORPVNMMERMGVSSIRALASVKYMIjEVGSAILI 233 

An alignment of the GAS and GBS proteins is shown helow. 

Identities = 105/231 (45%) , Positives = 142/231 (61%) , Gaps = 8/231 (3%) 

Query: 14 KILLVIPAYNEEGSIAKTVQTIVDFKASRSLPFELDYIVIMDGSTDGTPELLDRLGIiNHI 73 

K L++IPAYNE +1 T++TI S + DY1 + I+D STD T + + G N I 

Sbjct: 2 KKLIIIPAYNESSNIVNTIRTI ESDAPDFDYI I IDDCSTDNTLAI CQKQGFNVT 55 

Query: 74 DLVQNLGIGGCVQTGYLYANPlilHYDVAVQFDGDGCKDIRSIEDVVMPILNDEADFVIGSR 133 

L NLGIGG VQTGY YA R YDVAVQ DGDGQH+ +E +V ++ + VIGSR 
Sbjct: 56 SLPINLGIGGAVQTGYRYAQRCGYDVAVQVDGDGQHNPCYLEKMVEVLVQSSVNMVIGSR 115 

Query: 134 FVDKKHQNFQSTAMRRLGimiSAAIKlTTGHKVYDTTSGYRAANAALIAYLSCHYPVQY 193 

F+ K + FQS+ RR+GI + I L TG K+ D TSG R + +LI + HYP Y 
Sbjct: 116 FITK--EGFQSSFARRIGIKYFTWL1ALLTGKKITDATSGLRLIDRSLIERFANHYPDDY 173 

Query: 194 PEPESTARILKKGYRLKEOTANMFEREAGTSSISSLKSIFYMTDVLTSIII 244 

PEPE+ +L +++KE+ M ER+ G SSIS KS++YM V +I++ 
Sbjct: 174 PEPETVVDVLVSHFKAn<EIPvV^RQGGVSSISLTKSVYYMIKVTLAILV 224 

A related GBS gene <SEQ ID 875 1> and protein <SEQ ID 8752> were also identified. Analysis of thi 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: 0.29 
GvH: Signal Score (-7.5): -4.34 

Possible site: 29 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -2.92 threshold: 0.0 

INTEGRAL Likelihood = -2.92 Transmembrane 222- 238 ( 221 - 238) 
PERIPHERAL Likelihood =4.40 4 
modified ALOM score: 1.08 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 2168 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00548 (340 - 1056 of 1359) 

GP|3608398|gb|AAC35924.l| |AF071085(2 - 241 of 241) putative glycosyl transferase 
{Enterococcus faecalis} 
%Match =24.7 

% Identity =49.2 %Similarity =64.2 ■ 
Matches = 118 Mismatches = 85 Conservative Sub.s = 36 

249 279 309 339 369 399 429 456 

L*QD*GGYGNMVIAKINLSIKLCLNG*XQQI IXIRDKMMKKILLVIPAYNEEGSIAKTVQTIVDFKASRS-LPFELDYIV 
1=11=1111111 =1 =1= =1 II = = 1111=1 
MKVLLIIPAYNEEENILRTIASIETFKQEVTHFQHELDYW 



INDGSTDGTPELLDRLGLNHIDLVQNLGIGGCVQTGYLYA^TOTH^ 
Mill! ==|: =1 I || 111111 Hill II I MM llllll III 1= == 1= I || 
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INDGSTDGTKQILEWQINAIHIOTjNLGIGGAVQTGYIWA^ 



RFVDKKMQNFQSTJWRRLGINLISAAIKLTTGHKVYDTTSGyRAANAALIAYLSCinfPVQYPEPESTARILKKGYRLKEV 
11= =111 III II hi I =1 I I II I :||::= II MM ::|| : . | 

RFIPGNXASFQSXKMRRXGIRLLSFaCXXAXGXTIY^ 

140 150 160 170 180 190 200 

966 996 1026 1056 1086 1116 1146 1176 

TANMKEREAGTSSISSLKSIFYMTDVLTSIIIAGFIKEDDK*V*HCKI I KCLF*PLSYFI*L*EWLIKTHFLrjJVIiYLGY* 

II II I III = I |: II :| -=1-11 1=11 I 
PVNMMERLGGVSSIRALASVKYMLEVGSAILIAPFMKEGD 
220 230 240 

SEQ ID 8752 (GBS355) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 4; MW 27kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 7; MW 52kDa). 

GBS355-GST was purified as shown in Figure 213 (lane 4) and in Figure 216 (lane 6). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1219 

A DNA sequence (GBSxl295) was identified in S.agalactiae <SEQ ID 3781> which encodes the a min o 
acid sequence <SEQ ID 3782>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.91 Transmembrane 185 - 201 ( 185 - 201) 



- Final Results - 



■ -- Certainty=0 . 1765 (Affirmative; 
•— Certainty=0. 0000 (Not Clear) . 
•-- Certainty=0. 0000 (Not Clear) . 



The protein has homology with the following sequences in the GENPEPT database. 



MKVNILMATYNGEKFLAQQIESIQKQTFKEWNLLIRDDGSSDKTCDIIRNFTAKDSRIRF 60 
MKVNILM+TYNG++F+AQQI+SIQKQTF+ WNLLIRDDGSSD T II +F D+RIRF 
MKVNILMSTYNGQEFIAQQIQSIQKQTFENWKLLIRDDGSSDGTPKI IADFAKSDARIRF 6 0 

INENEHHNLGVIKSFFTLVNYEVADFYFFSDQDDWLPEKLSVSLEAAKHKASDVPLLVY 12 0 
IN ++ N GVIK+F+TI1+ YE AD+YFFSDQDDVWLP+KL ++L + + + + +PL+VY 
INADKRENFGVIIOTFYTLLKirEKADYYFFSDCDDVVILPQKIiELTLASVEKENNQIPLMVY 120 



TDL W+++L +L DSMI+ QSHHANT+LL ELTENTVTGGTMM+NH LA++W +D+ 



+MHDW+LALLAASLG++IYLD T+LYRQH++NVLGART KR K LR P 



D+ AN +1+ ++ + Q F+ R++WL +YG++KN+ 





1 


Sbjct: 


1 






Sbjct: 






121 


Sbjct: 


121 


Query: 


180 


Sbjct: 


181 


Query: 


239 


Sbjct: 


239 



Query: 299 WFKWLIATNYYNKR 313 
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VFK LI T + +R 
Sbjct: 296 FVFKTLIITKFGYRR 310 

A related DNA sequence was identified in S.pyogenes <SEQ ID 817> which encodes the amino acid 
5 sequence <SEQ ID 81 8>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 1980 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 178/314 (56%) , Positives = 232/314 (73%) , Gaps = 6/314 (1%) 



Query: 1 MKVNILMATYNGEKFLAQQIESIQKQTFKEVCILLIRDDGSSDKTCDIIRNFTAKDSRIRF 60 

M +NIL++TYNGE+FLA+QI+SIQ+QT +W LLIRDDGS+D T DIIR F +D RI++ 
Sbjct: 1 MNINILLSTYNGERFLAEQIQSIQRQTVNDWTLLIRDDGSTDGTQDIIRTFVKEDKRIQW 60 

Query: 61 INENEHHNLGVII<SFFTLvNYEVADFyFFSDQDDvV(LPEKLSVS-LEAAKHKASDVPLLV 119 

INE + NLGVIK+F+TL+ ++ AD YFFSDQDD+WL KL V+ LEA KH+ + PLLV 
Sbjct: 61 INEGQTENLGVIKNFYTLLKHQKADVYFFSDQDDIWLDNKLEVTLLEAQKHEMT-APLLV 119 

Query: 120 YTDLKVVNQELNILQDSMIRAQSHHAm'TLLPELTENTvTGGTMMINHALAEKWFTP^I 179 

YTDLKW Q L + DSMI+ QS HANT+LL ELTENTVTGGTMMI HALAE+VI T + + 
Sbjct: 120 YTDLKVVTQHLAVCHDSMIKTQSGEMrrSLLQELTENTVTGGTMMITHALAEEWTTCDGL 179 

Query: 180 LMHDWFLALLAASLGEIIYLDLPTQLYRQHDNNVLGARTMDKRFKILREGPKSIFTRYWK 239 

LMHDW+LALLA+++G+++YLD+PT+LYRQHD NVLGART KR K P + +YW 

Sbjct: 180 LMHDWYLMjLASAIGKLvYLDIPTELYRQHDANVI/SARTWSKRMKNWLT-PHHLVNKYWW 238 

Query: 240 LIHDSQKQASLIVDKYGDIMTANDLELIKCFIKIDKQPFMTRLRWLWKYGYSKNQFKHQV 299 

LI SQKQA L++D + ND EL+ ++ + PF RL L +YG+ KN+ H 

Sbjct: 239 LITSSQKQAQLLLDL PLKPNDHELVTAYVSLLDMPFTKRLATLKRYGFRKURIFHTF 295 

Query: 300 VFKWL IATNYYNKR 313 

+F+ L+ T + +R 
Sbjct: 296 I FRSLWTLFGYRR 309 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1220 

A DNA sequence (GBSxl296) was identified in S.agalactiae <SEQ ID 3783> which encodes the amino 
acid sequence <SEQ ID 3784>. This protein is predicted to be rgpAc. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 1881 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



55 A related GBS nucleic acid sequence <SEQ ID 9993> which encodes amino acid sequence <SEQ ID 9994> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:BAA32089 GB:AB010970 rgpAc [Streptococcus mutans] 
Identities = 234/362 (64%), Positives = 284/362 (77%) 

Query: 33 VSELINHQKSFDIKYHVACLSDKEHHTHFNPAD&DCFTINPPQLGPARVIAYDIMAINYA 92 

+ EL+ +++S + YHVACLS+ + H HF + DCFTI P+LGPARVIAYD+MAI YA 
Sbjct: 1 MEELVKYKQSQQLTYHVACLSETDQHKHFTYLGVDCFTIKAPKLGPARVIAYDMMAIRYA 60 

Query: 93 LDLVKTHDLKEPIFYILGNTIGAFIWHFANKIHKVGGLLYVNPDGLEWKRSKWSRPTQRY 152 

L L+K +K PIFYILGNTIGAF+ FA KI ++GG Y+NPDGLEW+RSKWSRP Q Y 
Sbjct: 61 LKLIKDQKIKHPIFYILGNTIGAFMGPFARKIKRIGGRFYINPDGLEWRRSKWSRPVQAY 120 

Query: 153 LKYAEKCMTKNADLIISDNIGIENYIQSTYSNVKTRFIAYC-TEINSRKLSSDDPRVKQLF 212 

LKYAEKCMTK ADL+ISDN GIE YI+ Y KT FIAYGT+++ L +D +VK + 
Sbjct: 121 LKYAEKCMTKKADLV1SDNTGIEGYIKQMYPWAKTTFIAYGTDLSPSGLLKNDSKVKDFY 180 

Query: 213 KKMNIKSKGYYLIVGRFVPENNYETAIREF^SDTKRDLVIICNHQNNPYFEKLSLKTNL 272 

KKW IK KGYYLIVGRFVPENNYETAIREFM S ++RDLVIICN++ N YFE L KT 
Sbjct: 181 KKWAIKEKGYYLIVGRFVPENNYSTAIREFMTSSSERDLVIICNYEGNAYFEDLRQKTEF 240 

Query: 273 QQDKKVKFVGTLYEKDLLDYWQQAFAYIKGHEVGGTNPGLLEALANTDLNLVI.DVDFNK 332 

+DKR+KFVGT+Y++ LL Y+R+QAFAYIHGHEVGGTNPGLLEALA+TDLNLVL +FN 
Sbjct: 241 DKDKRIKFVGTVYDRPLLTYIREQAFAYIHGHEVGGTNPGLLEALAHTDLNLVLITEFNY 300 

Query: 333 SVAGLSSFYWAKKEGDLAKLINDSDQQQDLSTYGDRAKAIIQENYTWKKIVEEYEDLFLN 392 

+VA ++ YW + G LA+LIN D+Q++ + YG RAK II YTW+KIVEEYEDLFL+ 
Sbjct: 301 TVALDAARYWTQDNGSLAQLINQFDKQENFAEYGQRAKEIIVNYYTWEKIVEEYEDLFLH 360 

Query: 393 ES 3 94 
ES 

Sbjct: 361 ES 362 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3785> which encodes the amino acid 
sequence <SEQ ID 3786>. Analysis of this protein sequence reveals the following: 



INTEGRAL Likelihood = -1.38 Transmembrane 95 - 111 ( 95 - 111) 



Final Results 

bacterial membrane --- Certainty=0 . 1553 (Affirmative) < suco 

bacterial outside Certainty=0 . 0 0 0 0 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 250/383 (65%) , Positives = 307/383 (79%) 

Query: 11 MQDVFI IGSRGLPARYGGFETFVSELINHQKS FDI KYHVACLSDKEHHTHFNFADADCFT 70 

MQDVFI IGSRGLPA+YGGFETFV ELI+HQ S +I+YHVACLSD +H HF++ ADCF 
Sbjct: 1 MQDVFI IGSRGLPAKYGGFETFVEELISHQSSKNIRYHVACLSDTKHKVHFDYKGADCFY 60 

Query: 71 INPPQLGPARVIAYDIMAINYALDLVKTHDLKEPIFYILGNTIGAFIWHFANKIHKVGGL 130 

+NPP+LGPARVIAYD+MAI YAL H ++ PIFY+LGNT+GAFI F +IH GG 

Sbjct: 61 LNPPKLGPARVIAYDI^IAITYALSYSEQHQIQNPIFYVIjGNTVGAFIAPFVKQIHNRGGR 120 

Query: 131 LYVNPDGLEWKRSKWSRPTQRYLKYAEKCMTKNADLIISDNIGIENYIQSTYSNVKTRFI 190 

++NPDGLEWKRSKWSRP Q YLK++EK MT+ ADL+ISDNIGI+ Y++ Y KT FI 
Sbjct: 121 FFINPDGLEWKRSKWSRPVQAYLKFSEKQMTRQADLVISDNIGIDRYLKQVYPWSKTCFI 180 

Query: 191 AYGTEINSRKLSSDDPRWQLFKKWNIKSKGYYLIVGRFVPENNYETAIREFMASDTKRD 250 

AYGT+ +L++ D +V+ F+ ++I+ K YYLI+GRFVPENNYETAI+EFMAS TKRD 
Sbjct: 181 AYGTQTQPSRLATADSKVRAYFQTFDIREKDYYIiILGRFVPENNYETAIKEFMASSTKRD 240 

Query: 251 LVIIf^QNNPYFEKLSLKTNLMDKRvICWGTLYEKDLLDYWQQAFAYIHGHEVGGTN 310 

LVIICNH+ N YF++L +T +D R+KFVGTLY+K+LL Y+R+QA+AYIHGHEVGGTN 
Sbjct: 241 LVI I CNHEGNAYFKQLLAETECDKDPRI KFVGTLYDKELLAYIREQAYAYIHGHEVGGTN 300 
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Query: 311 PGLLEMJOTTDIJ^VLDVDFNKBVAGLSSFYWAKKEGDIAKLINDSDQQQDLSTYGDRAK 370 

PGLLEAIA+T+LNLVL VDFN+SVA ++ YW K++G LA+LIN D D G AK 
Sbjct: 301 PGLLEAIAHTNiMjVLGVDFNQSVAKSAM.YWTKQKGQLAELINQVDAGFDSDHLGKEAK 360 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1221 

A DNA sequence (GBSxl297) was identified in S.agalactiae <SEQ ID 3787> which encodes the amino 
acid sequence <SEQ ID 3788>. This protein is predicted to be dTDP-L-rhamnose synthase. Analysis of this 
protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1059 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



MILITGANGQLGSELRHLLDERTQEYVAVDVAEMDITNAE^KVFEEVKPSLVYHCAAY 6 0 
MILITGANGQLG+ELR+LLDER +EYVAVDVAEMDIT+AEMV+KVFEEVKP+LVYHCAAY 
MILITGANGQLGTELRYLLDERNEEYVAVDVAEMDITDAE^KVFEEVKPTLVYHCAAY 6 0 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 



TAVDAAEDEGKELDFAINVTGT+NVAKA+ KH ATLVYI STDYVFDG+KPVGQEWEVDD 



Q+GRPTWTRTLAEFMTYIAEN+K+FGYYH^SNDA EDTTWYDFAVEILKDTDVEVKPVDS 



SQFPAKAKRPLNSTMSL KAKATGFVIPTWQDAL+EFYKQEV+ 
SQFPAKAKRPLNSTMSLAKAKATGFVIPTWQDALQEFYKQEVR 283 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3789> which encodes the amino acid 
sequence <SEQ ID 3790>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 0618 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 227/284 (79%) , Positives = 248/284 (86%) 
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Query: 1 MILITGANGQLGSELRHLLDERTQEYVAVDVAEMDITNAEMVDKVFEEVKPSLVYHCAAY 60 

MILITG+NGQLG+ELR+IiLDER +YVAVDVAEMDITN + V+ VP +VKP+LVYHCAAY 
Sbjct: 21 MILITGSNGQLGTELRYLLDERGVDYVAVDVAEM3ITNEDKVEAVFAQVKPTLVYHCAAY 80 

Query: 61 TAVDAAEDEGKELDFAINVTGTEtWAKAAAKHDATLVYISTDYVFDGEKPVGQEWEVDDL 120 

TAVDAAEDEGK L+ AINVTG+EN+AKA K+ ATLVYI STDYVFDG KPVGQEW D 
Sbjct: 81 TAVDAAEDEGKALNEAINVTGSENIAKACGKYGATLVYISTDYVFDGNKPVGQEWVETDH 140 

Query: 121 PDPKTEYGRTKRMGEELVEKYTSKFYT I RTAWVFGNYGKNFVFTMQNIAKTHKTLTWND 180 

PDPKTEYGRTKR+GE VE+Y FY IRTAWVFGNYGKNFVFTM+ LA+ H LTWND 
Sbjct: 141 PDPKTEYGRTKRLGEIAVERYAEHFYIIRTAWVFGNYGKNFVFTMEQLAENHSRLTWND 200 

Query: 181 QHGRPTWTRTIAEFMTYLAENQKDFGYYHLSNDAKEDTTWYDFAVE1LKDTDVEVKPVDS 240 

QHGRPTWTRTLAEFM YL ENQK FGYYHLSNDAKEDTTWYDFA EILKD VEV PVDS 
Sbjct: 201 QHGRPTWTRTLAEFMCYLTENQKAFGYYHLSNDAKEDTTWYDFAKE1LKDKAVEWPVDS 260 

Query: 241 SQFPAKAKRPMSTMSLEKAKATGFVIPTWQDALKEFYKQEVKK 284 

S FPAKAKRPLNSTM+L+KAKATGFVI PTVfQ+ALK FY+Q +KK 
Sbjct: 261 SAFPAKAKRPLNSTMNLDKAKATGFVIPTTJQEALKAFYQQGLKK 304 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1222 

A DNA sequence (GBSxl298) was identified in S.agalactiae <SEQ ID 379 1> which encodes the amino 
acid sequence <SEQ ID 3792>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2554 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certair.ty=0 . 0000 (Not Clear) < suco 



35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA21508 GB:AB000631 unnamed protein, product [Streptococcus mutans] 
Identities = 92/108 (85%) , Positives = 100/108 (92%) 

Query: 5 KQYSEEEVGKIKDRILEALEMVIDPELGIDIVKLGLIYEIRFEDNGRTEIDMTLTTMGCP 64 
40 K Y+ EE+ KIKDRILEALEMVIDPELGIDIVNLGLIY+IRFED+GRTEIDMTLTTMGCP 

Sbjct: 4 KNYTPEEIAKIKDRILEALEMVIDPELGIDIVNLGLIYDIRFEDSGRTEIDMTLTTMGCP 63 

Query: 65 LADLLTDQIHDVMKWPEVTETEVKLVWYPAWSVDKMSRYARIALGIR 112 
LADLLTDQIHD +K VPEV + +VKLVW PAW+VDKMSRYARIALGIR 
45 Sbjct: 64 lADLLTDQIHDALKDVPETODIDvTO.WSPAKTVDKT'ISRYARIALGIR 111 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3793> which encodes the amino acid 

sequence <SEQ ID 3794>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
50 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O. 2818 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 90/112 (80%), Positives = 102/112 (90%) 
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Query: 1 MSEVKQYSEEEVGKIKDRILEALEMVIDPELQIDIVNLGLIXEIRFEIHIGRTEIDMTLTT 60 

MS+ +Y++++V IK+RILEALE VIDPELGID+VNLGLIYEIRF DNG TEIDMTLTT 
Sbjct: 1 MSDTPKYTQDQVIAIKNRILEMjEWIDPELGIDVVNLGL,IYEIRFNDIS!GYTEIDMTLTT 60 

Query: 61 MGCPIADLLTDQIHDVMKTVPEVTETEVKIiVWYPAWSVDKMSRYARIALGIR 112 

MGCPLADLLTD IHD ++ VPEVT+TEVKLVWYPAW+VDKMSRYARIALGIR 
Sbjct: 61 MGCPLADLLTDYIHDALQDVPEVTKTEVKLVWYPAWTVDKMSRYARIALGIR 112 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1223 

A DNA sequence (GBSxl299) was identified in S.agalactiae <SEQ ID 3795> which encodes the amino 
acid sequence <SEQ ID 3796>. This protein is predicted to be RNA polymerase sigma factor, sigma-70 
family (rpoD). Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3157 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein is similar to the sigma-42 protein from S.mutans: 

>GP:BAA21507 GB.-AB000631 sigma 42 protein [Streptococcus mutans] 
Identities = 345/367 (94%), Positives = 358/367 (97%) 

Query: 14 EKKSNTTFNVQVADFIRNHKKQGTAIDDEVTEKLVIPFVLDADQIDDLLERLTDGGISIT 73 

+KK ++TFNVQVADFIRNHKK+G A+DDEVTEKLVIPF L+A+QIDDLLERLTDGGISIT 
Sbjct: 5 KKKTSSTFNVQVADFIRNHKKEGVAVDDEVTEKLVIPFELEAEQIDDLLERLTDGGISIT 64 

Query: 74 DKEGNPSTKYvVEGPKPEELTDEELIGSNSAKVNDPVRMYLKEIGWPLLTNEEEKEIiAV 133 

D+EGNPSTKY VE KPEELTDEEL+GSNSAKVNDPVRMYLKEIGWPLLTNEEEKELA+ 
Sbjct: 65 DREGNPSTKYAVEEIKPEELTDEELLGSNSAKVOTPVRIWLKEIGWPLLTNEEEKELAI 124 

Query: 134 AVAEGDLMAKQRLAEANLRLWSIAI<RYVGRGMQFLDLIQEGNMGLMICAVDKFDYSKGFK 193 

AV GDL AKQRLREANLRLWSIAfCRYVGRGMQFLDLIQEGNMGLMKAVDKPDYSKGFK 
Sbjct: 125 AVENGDLEAKQRLAEANLRLWSIAKRYVGRGMQFLDLIQEGNMGLMKAVDKFDYSKGFK 184 

Query: 194 FSTYATWWIRQAITRAIADQARTIRIPVHMVETINKBVREQRNLLQELGQDPTPEQIAER 253 

FSTYATWWIRQAITRAIADQARTIRIPVHMVETINKLVREQRNLLQELGQDPTPEQIAER 
Sbjct: 185 FSTYATWWIRQAITRAIADQARTIRIPVHKVETINKLVREQRNLLQELGQDPTPEQIAER 244 

Query: 254 MDMTPDKVRE I LKIAQEPVSLET? IGEEDDSHLGDFI 3DEVIENPVDYTTRWLREQLDE 313 

MDMTPDKVREILKIAQEPVSLETPIGEEDDSHLGDFI3DEVIENPVDYTTRWLREQLDE 
Sbjct: 245 MDMTPDKVREILKIAQEPVSLETPIGEEDDSHLGDFIEDEVIENPVDYTTRWLREQLDE 304 

Query: 314 VLDTLTDREENVLRLRFGLDDGKMRTLEDVGKVFNVTRERIRQIEAKALRKLRHPSRSKQ 373 

Sbjct: 305 \ 

Query: 374 LKDFMED 380 

L+DF+ED 
Sbjct: 365 LRDFVED 371 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3797> which encodes the a 
sequence <SEQ ID 3798>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 1788 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 351/369 (95%), Positives = 364/369 (98%) 

Query: 12 MAEKKGNTTFNVQVADFIRNHKKQGTAIDDEVTEKLVIPFVLDADQIDDLLERLTDGGIS 71 

M ++K TTFNVQVA+FIR+HKK+GTAIDD+VTEKLVIPF LDADQIDDLLERLTDGGIS 
Sbjct: 1 MTKQKEITTFNVQVAEFIRHHKKEGTAIDDDvTEKLVIPFALDADQIDDLLERLTDGGIS 60 

Query: 72 ITDKEGNPSTKYVVEGPKPEELTD3ELIGSNSAKVNDP\'RMyLKEIGVVPLLTNEEEKEL 131 

ITDKEGNPS+KY+VE PKPEELTDEELIGSNSAKVNDPVRMYLKEIGWPLLT+EEEKEL 
Sbjct: 61 ITDKEGNPSSKYIVEEPKPEELTDEELIGSNSAKVNDPVRMYLKEIGWPLLTSEEEKEL 120 

Query: 132 AVAVAEGDL^KQRLAEANLRLWSIAKRYVGRGMQFLDLIQEGNMGLMKAVDKFDYSKG 191 

AVAVA+GDLMAKQRLAEANLRLWSIAKRYVGRGMQFLDLIQEGNMGLMKAVDKFDySKG 
Sbjct: 121 AVAVAKGDLMAKQRLAEANLRLWSIAKRYVGRGMQFLDLIQEGNMGLMKAVDKFDYSKG 180 

Query: 192 FKFSTYATWWIRQAITRAIADQARTIRIPVHMVETINKLVREQRNLLQELGQDPTPEQIA 251 

FKFSTYATWWIRQAITRAIADQARTIRIPVHMVETINKLVREQRNLLQELGQDPTPEQIA 
Sbjct: 181 FKFSTYATWWIRQAITRAIADQARTIRIPVHtWETINKLvREQRNLLQELGQDPTPEQIA 240 

Query: 252 ERMDMTPDKvREILKIAQEPVSLETPIGEEDDSHLGDFIEDEVIENPVDYTTRWLREQL 311 

ERM+MTPDKVREILKIAQEPVSLETPIGEEDDSHLGDFIEDEVIENPVDYTTRWLREQL 
Sbjct: 241 ERMEMTPDKVREILKIAQEPVSLETPIGEEDDSHLGDFIEDEVIENPVDYTTRW1REQL 3 00 

Query: 312 DEVLDTLTDREENVLRLRFGLDrXjKMRTLEDVGKVFNVTRERIRQIKAKALRKLRHPSRS 371 

DEVLDTLTDREENVLRLRFGLDDGKMRTLEDVGKvFNVTRERIRQIEAKALRKLRHPSRS 
Sbjct: 301 DEVLDTLTDREEW/LRLRFGLDDGKMRTLEDVGKVFNVTRERIRQIEAKALRKLRHPSRS 360 

Query: 372 KQLKDFMED 380 

KQL+DF+ED 
Sbjct: 361 KQLRDFIED 369 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1224 

A DNA sequence (GBSxl300) was identified in S.agalactiae <SEQ ID 3799> which encodes the amino 
acid sequence <SEQ ID 3800>. Analysis of this protein sequence reveals the following: 

? N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2853 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 , 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1225 

A DNA sequence (GBSxl301) was identified in S.agalactiae <SEQ ID 3801> which encodes the amino 
acid sequence <SEQ ID 3802>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2193 (Af f irmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA03516 GB:D14690 DNA primase [Lactococcus lactis] 
Identities = 206/398 (51%), Positives = 294/398 (73%), Gaps = 6/398 (1%) 

15 

Query: 37 LAIDKEKISEIKNSWIVDVIGEVVGLTKTGRNHLGLCPFHKEKTPSFNVIEDRQFFHCF 96 

+++D E ++++K+ VNI D+I + V L++TG+N++GLCPFH EKTPSFNV ++ F+HCF 
Sbjct: 2 VSLDTEVVroLKSKVNIADLISQYVALSRTGKNYIGLCPFHGEKTPSFNVNAEKGFYHCF 61 

20 Query: 97 GCGRSGDVFKFVEDYQHI S FLDSVQ VLAERSGI PLDTNFKGQVPKKPKANQSLIjD I HRVA 156 

GCGRSGD +F+++Y + F+D+V+ LA+ +G+ L N +K N L +1+ A 

Sbjct: 62 GCGRSGDAIEFLKEYNQVGFVDAVKELADFAGVTL--NISDDREEKNNPNAPLFEINNQA 119 

Query: 157 SGFYHAYLMTTNDGERARQYLAERGvTEDLIKHFQIGLSPGGQDFLYRRLAKEFDEKTLM 216 
25 + Y+ LM+T GEFAR+YL ERG+T+D+IK F IGL+P DF+++ L+ +FDE+ + 

Sbjct: 120 ARLYNILLMSTELGERARKYLEERGITDDVIKRFNIGIiAPEENDFIFKNLSNKFDEEIMA 179 

Query: 217 SSGLFNYSENSNQFYDSFNNRIMFPLTNDIGEVIAFSGRVWTQEDIDRKQAKYKNSRATP 276 
SGLF++S +N+ +D+F NRIMFP+TN+ G+ I FSGR W QE+ D K AKY N+ AT 
30 Sbjct: 180 KSGLFHFS- -NNKVFDAFTNRIMFPITNEYGQTIGFSGRKW-QENDDSK-AKYINTSATT 235 

Query: 277 IFNKSYELYHLDKARAVINKAHEVYLMEGFMDVIAAYRAGIENWASMGTALONEirVWIL 336 

IF+KSYEL++LDKA+ I+K HEVYLMEGFMDVIA+Y+AGI NWASMGTALT +HVR L 
Sbjct: 236 IFDKSYELWm,DKAKPTISKQHEVYLMEGFMDVIASYKaGINNWASMGTALTEKHVRRL 295 

35 

Query: 337 KRFTKKWLTYDGDRAGQNAIDKSLELLSDMTVDIVRIPNKMDPDEFLQANSAEDFKQLL 396 

K+ KK VL YDGD AGQNAI K+++L+ + V IV++P +DPDE+ + + L+ 
Sbjct: 296 KQMAKKFVLVYDGDSAGQNAIYKAIDLIGESAVQIVKVPEGLDPDEYSKNYGLKGLSALM 355 

40 Query: 397 ENGRISNTEFYIHYLKPENTDNLQSEIAYVEKIAKLIA 434 

E GRI EF I YL+PEM NLQ+++ ++E+I+ +IA 
Sbjct: 356 ETGRIQPIEFLIDYLRPENIANLQTQLDFIEQISPMIA 393 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3803> which encodes the amino acid 
45 sequence <SEQ ID 3804>. Analysis of this protein sequence reveals the following: 
Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 3532 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

55 Identities = 378/604 (62%) , Positives = 477/604 (78%) , Gaps = 2/604 (0%) 

Query: 28 MGYFCGGHDLAIDKEKISEIKNSVNIVDVIGEWGLTKTGRNHLGLCPFHKEKTPSFNVI 87 

MG+ GG DLAIDKE IS++KNSVNIVDVIGEW L+++GR++LGLCPFHKEKTPSFNV+ 
Sbjct: 1 MGFLWGGDDLAIDKEMISQVKNSVNIVDVIGEWKLSRSGRHYLGLCPFHKEKTPSFNW 60 

60 

Query: 88 EDRQFFHCFGCGRSGDVFKE"/EDYQHISFLDSVQ\ r LAERSGIPLDTNFKGQV--PKKPKA 145 
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EDRQFFHCFGCG+SGDVFKF+E-1-Y+ + FL+SVQ++A+++G+ L+ V + 

Sbjct: 61 EDRQFFHCFGCGKSGDVFKFIEEYRQVPFLESVQIIADKTGMSENIPPSQAVLRSQHKHP 120 

Query: 146 NQSLLDIHRVASGFYHAYLMTTlJDGERftRQYIiAERGVTEDLIKHFQIGLSPGGQDFLYRR 205 

N +L+ +H A+ FYHA LMTT G+ AR+YL +RG+ + LI+HF IGL+P D+LY+ 
Sbjct: 121 NHALMTLHEDAAKFYHAVLMTTTIGQEARKYLYQRGLDDQLIEHFMIGIAPDESDYLYQA 180 

Query: 206 LAKEFDEKTLMSSGLFNYSENSNQFYDSFMNRIMFPLTNDIGEVIAFSGRVWTQEDIDRK 265 



Query: 266 QAKYKNSRATPIFNKSYELYHLDKARAVINKAHE^/YLMEGFMDVIAAYRAGIENWASMG 325 

QAKYKNSR T +FNKSYELYHLDKAR VI K HEV+LMEGFMDVIAAYR+G EN VASMG 
Sbjct: 241 QAKYKNSRGTVLFNKSYELYHLDKARPVIAKTHEVFLMESFMDVIAAYRSGYENAVASMG 300 

Query: 326 TALTHEHWHLIO?FTKKOTLTYDGDRAGQNAIDKSLELLSDMTVDIVRIPNKMDPDEFLQ 385 

TALT EHV HLK+ TKKWL YDGD AGQ+AI KSLELL D V+IVRIPNKMDPDEF+Q 
Sbjct: 301 TALTQEHVNHLKQVTKKVVLIYDGDDAGQHAIAI<BLELLI<DFVVEIWIPNKNDPDEFVQ 360 

Query: 386 ANSAEDFKQLLENGRISNTEFYIHYLKPENTDNLQSEIAYVEKIAKLIAKSPSITAQNSY 445 

+S E F LL+ R1S+ EF+I YLKP N DNLQS+I YVEK+A LIA+SPSITAQ+SY 
Sbjct: 361 RHSPEAFADLLKQSRISSVEFFIDYLKPTNTONLQSQIVYVEKMAPLIAQSPSITAQHSY 420 

Query: 446 ITKVAELLPDFDYFQVEQSVNNERLHHRSQQQASSSVQTSATVQLPQTGKLSAITKTEMQ 505 

I K+A+LLP+FDYFQVEQSVN R+ R + Q + S V LP L+AI KTE 
Sbjct: 421 1 

Query: 506 I 

L HRLL+H YLLNEFR+RD+FYFDT+ +++LY+ LK+ G ITSYDLS+ S++VNR YY + 
Sbjct: 481 LMHRLLHHDYLLNEFRHRDDFYFDTSTLELLYQRLKQQGHITSYDLSEMSEEVNRAYYNV 540 

Query: 565 LEEQLPVEVSIGEIFAVEKARDRLLKERDIiRKQSQLIRQSSNQGDEEGALAALENLIAQK 625 

LEE LP EV++GEI+ + R +LL ERDL KQ + +R+SSN+GD + AL LE+ IAQK 
Sbjct: 541 LEENLPKEVALGEIDDILSKRAKLIAERDLHKQGKKVRESSNKGDHQAALEVLEHFIAQK 600 

Query: 626 RNME 629 
R ME 

Sbjct: 601 RKME 604 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1226 

A DNA sequence (GBSxl302) was identified in S.agalactiae <SEQ ID 3805> which encodes the amino 
acid sequence <SEQ ID 3806>. Analysis of this protein sequence reveals the following: 
Possible site: 47 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.05 Transmembrane 41 - 
INTEGRAL Likelihood = -5.79 Transmembrane 93 - 



Final Results 

bacterial membrane Certainty=0. 3421 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9995> which encodes amino acid sequence <SEQ ID 9996> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC38560 GB:AF029731 large conductance mechanosensitive channel 
[Staphylococcus aureus] 
Identities = 64/126 (50%) , Positives = 83/126 (65%) , Gaps = 8/126 (6%) 
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Query: 23 MIKELKEFLFKGNVLDIAVAVIIXSAAENAIITSLVKDVITPLIMPVLKftAGVSNIA-QL 81 
M+KE KEF KGNVLDLA+AV++GAAFN II+SLV+++I PLI K G + A + 

^ Sbjct: 1 MLKEFKEFALKGNVLDLAI AWMGRAFNK1 1 SSLVENI IMPLI GKI FGSVDFAKEW 56 

Query: 82 SVTOGVAYGNFLSAVINFLIVGTTLFFIVKAANKVMAKKPAEEEIIEVVEPTQEQLIAEIR 141 

S+ G+ YG F+ +VI+F+I+ IiF VK AN +M K+ AEE E V LL EIR 

Sbjct: 57 SFWGIKYGLFIQSVIDFIIIAFALFIFVKIflNTLMKKEEAEE EAWEENWLLTEIR 113 

10 Query: 142 DLLANK 147 

DLL K 
Sbjct: 114 DLLREK 119 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3807> which encodes the amino acid 
1 5 sequence <SEQ ID 3808>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.95 Transmembrane 71 - 37 ( 57 - 90) 

20 Final Results 

bacterial membrane --- Certainty=0. 3378 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

>GP:CAB15653 GB:Z99122 similar to large conductance mechanosensitive 
channel protein [Bacillus subtilis] 
Identities = 61/126 (48%), Positives = 77/126 (60%), Gaps = 7/126 (5%) 

30 Query: 1 MVKELKAFLFRGNI 1ELAVAVI IGGAFGAI VTSFVNDI ITPLILNPALKAANVENITQLS 60 

M E KAF RGNI++LA4 V+IGGAFG IVTS VNDII PL+ L + ++ 
Sbjct: 1 MWNEFKAFAMRGNIVDLAIGVVIGGAFGKIVTSLVNDIIMPLV-GLLLGGLDFSGLSFTF 59 

Query: 61 WNG-VKYGSFLGAVINFLIIGTSLFFWKAAEKAMPKKE KEAAAPTQEELLTEIR 114 

35 + VKYGSF+ ++NFLII S+F V++ KKE E A QEELL EIR 

Sbjct: 60 GDAWKYGSF IQT IVNFLI I SFS I F I VIRTLNGLRRKKEAEEEAAEEAVDAQEELLKEI R 119 

Query: 115 DLLAQK 120 
DLL Q+ 

40 Sbjct: 120 DLLKQQ 125 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 86/125 (68%), Positives = 99/125 (78%), Gaps = 5/125 (4%) 

45 Query: 23 MIKELKEFLFKGNVLDIAVAVILGAAFimilTSLVKDVITPLILNPVLKAAGVSNIAQLS 82 

M+KELK FLF+GN+ + +LAVAVI +G AF AI+TS V D+ITPLILNP LKAA V NI QLS 
Sbjct: 1 MVKELKAFLFRGNI IEIAVAVIIGGAFGAIWSFVITOIITPLIIjNPALKAANVENITQLS 60 

Query: 83 OTGVAYGNFLSAVINFLIVGTTLFFIVKAANIOTffilCKPAEEEIIEVVEPTQEQLLAEIRD 142 
50 WNGV YG+FL AVINFLI+GT+LFF+VKAA K M KK E PTQE+LL EIRD 

Sbjct: 61 fMGVICYGSFLGAVINFLIIGTSLFFVVKAAEKAMPKKEK EAAAPTQEELLTEIRD 115 

Query: 143 LLANK 147 
LLA K 

55 Sbjct: 116 LLAQK 120 

A related GBS gene <SEQ ID 8753> and protein <SEQ ID 8754> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
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Net Charge of CR: 1 
McG: Discrim Score: 4.39 
GvH: Signal Score (-7.5): -1.79 
Possible site: 25 
5 >>> Seems to have a cleavable N-term signal seq. 

Amino Acid Composition: calculated from 26 
ALOM program count: 1 value: -5.79 threshold: 0.0 

INTEGRAL Likelihood = -5.79 Transmembrane 71 - 87 ( 68 - 90) 
PERIPHERAL Likelihood =1.06 28 
10 modified ALOM score: 1.66 

icml HYPID : 7 CFP: 0.331 



15 Final Results 

bacterial membrane Certainty=0 . 3314 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0 . 0 0 0 0 (Not Clear) < suco 

20 The protein has homology with the following sequences in the databases: 

ORF0054K367 - 741 of 1041) 

SP|068285|MSCL_STAAU(1 - 119 of 120) LARGE -CONDUCTANCE MECHANOSENSITIVE CHANNEL. 
GP|3135292|gb|AAC38560.l| |AF029731 large conductance mechanosensitive channel 
{Staphylococcus aureus} 
25 %Match = 14 . 9 

%Identity =53.3 %Similarity =70.5 

Matches = 65 Mismatches = 31 Conservative Sub.s =, 21 

177 207 237 267 297 327 357 387 

30 Q VMTSTEITHYS FTFDYI I FS FLCKFFQKLFQGFLLH* FNI KIYR* FETYYLDFSKE I CYNERELNNI KELVHMI KELKE 

1=11=11 
MLKEFKE 

417 447 477 507 537 561 591 621 

35 FLFKGNVLDLAVAVILGAAFNAIITSLVKDVITPLILNPVLKAAGVSNIAQLSWN--GVAYGNFLSAVINFLIVGTTLFF 
I =lll!]]||=II==llill !l=lll===l Hi 11= 1= 1= 1= 11 1= =11=1=1= II 

FALKGNVLDLAIAWMGAAFNKI ISSLVENIIMPLI GKIFGSVDFAK-EWSFWGIKYGLFIQSVIDFI I IAFALFI 

20 30 40 50 60 70 80 

40 651 681 711 741 771 801 831 861 

IvXAANKVMAKKPXEEEIIEVVEPTQEQLI^EIRDLLAE^**KTRITEFFY*LIVIIYEKTAQF*TVFSYSI*LEFFTFA 

II II =1 1= III 111 II llllll I 

FVKIANTLMKKEEAEEE - - AWE - ENWLLTE IRDLLREKK 
100 110 120 

45 

SEQ ID 8754 (GBS354) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 3; MW 17kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 1227 

A DNA sequence (GBSxl303) was identified in S.agalactiae <SEQ ID 3809> which encodes the amino 
acid sequence <SEQ ID 3810>. This protein is predicted to be 3 OS ribosomal protein S21 -related protein. 
Analysis of this protein sequence reveals the following: 

Possible site: 29 
55 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 6479 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9391> which encodes amino acid sequence <SEQ ID 9392> 
was also identified. A related GBS nucleic acid sequence <SEQ ID 10799> which encodes amino acid 
5 sequence <SEQ ID 1080O was also identified. 

The protein is similar to the 30S ribosomal protein S21 from Listeria monocytogenes: 

>GP:BAAS2793 GB:AB023064 30S ribosomal protein S21 [Listeria monocytogenes] 
Identities = 30/34 (88%), Positives = 34/34 (99%) 

10 Query: 1 MTKAGTIiQESRKREFYEKPSVKRKRKSBMRKRK 34 

4+K+GTLQESRKREFYEKPSVKRK+KSEA7ARKRK 
Sbjet: 23 VSKSGTLQESRKREFYEKPSVKRKKKSEAARKRK 56 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3811> which encodes the amino acid 
1 5 sequence <SEQ ID 38 12>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>» Seems to have no N- terminal signal sequence 

Final Results 

20 bacterial cytoplasm --- Certainty=0 .4815 (Affirmative) < suco 

bacterial membrane --- Certainty=0.0000<Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) suco 

An alignment of the GAS and GBS proteins is shown below. 

25 Identities = 35/36 (97%) , Positives = 36/36 (99%) 

Query: 1 MTKAGTLQESRKREFYEKPSVKRKRKSERRRKRKKP 36 

+TKRGTLQESRKREFYEKPSVKRKRKSEAflRKRKKF 
Sbjct: 35 VTKAGTLQESRKREFYEKPSVKRKRKSEAARKRKKF 7 0 

30 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1228 

A DNA sequence (GBSxl304) was identified in S.agalactiae <SEQ ID 3813> which encodes the amino 
35 acid sequence <SEQ ID 3814>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.06 Transmembrane 5 - 21 ( 3 - 23) 
INTEGRAL Likelihood = -2.28 Transmembrane 191 - 207 ( 189 - 207) 

40 

Final Results 

bacterial membrane --- Certainty=0 . 3824 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

45 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8755> and protein <SEQ ID 8756> were also identified. Analysis of this 
protein sequence reveals the following: 

50 Lipop Possible site: -1 Crend: 2 

McG: Discrim Score: 8. 58 

GvH: Signal Score (-7.5): -5.71 
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Possible site: 18 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 2 value: -7.06 threshold: 0. 
INTEGRAL Likelihood = -7.06 
INTEGRAL Likelihood = -2.28 
PERIPHERAL Likelihood = 4.35 
modified ALOM score: 1.91 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 .3824 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8756 (GBS259) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 45 (lane 4; MW 54kDa). 

Based on this analysis, it was predicted that these proteins and then epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1229 

A DNA sequence (GBSxl305) was identified in S.agalactiae <SEQ ID 3815> which encodes the amino 
acid sequence <SEQ ID 3816>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Likelihood = -1.38 Transmembrane 135 - 152 ( 135 - 152) 



Final Results 

bacterial membrane Certainty=0 . 1553 (Af f ii 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD47593 GB:AF140784 Vexp2 [Streptococcus pneumoniae] 
Identities = 117/212 (55%), Positives = 152/212 (71%) 

Query: 1 MLELKNIAYRYKGNDNKTLENINYSFQSGVFYTILGNSGSGRTTLLSLMAGLDSPTEGQV 60 

+L+L+++ YRYK L INY+F+ G FY+I+G SG+GK+TLLSL+AGLDSP EG + 

Sbjct: 3 LLQLQDvTYRYKNTAEAVLYQINYNFEPGKFYSIIGESGAGKSTLLSLIiAGLDSPVEGSI 62 

Query: 61 LFNKKDIKEAGYAQHRKKNIALVFQNYNLLDYLTPLENVQLVKPTADKQLLLDLGLKEDM 120 

LF +DI++ GY+ HR + I+LVFQNYNL+DYL+PLEN++LV A K LL+LGL E 
Sbjct: 63 LFQGEDIRKKGYSYHRMHHISLVFQNYNLIDYLSPLENIRLVNKKASKNTLLELGLDESQ 122 

Query: 121 LTRNILRLSGGQQQRVAIARALWGTPAILLDEPTGNLDFDISRDITMRLKDFAHKEKRC 180 

+ RN+L+LSGGQQQRVAIAR+LV P IL DEPTGNLD + DI LK A K +C 
Sbjct: 123 IKRNVLQLSGGQQQRVAIARSLVSEAPVIIADEPTGNLDPKTAGDIVELLKSLAQKTGKC 182 

Query: 181 VIMVTHSREIAHMADTALQLIGDNLKELSKES 212 

VI+VTHS+E+A +D L+L L E S 
Sbjct: 183 VI WTHSKEVAQASD I TLELKDKKLTETRNTS 214 

SEQ ID 3816 (GBS363) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 5; MW 28kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 10; MW 53kDa). 

GBS363-GST was purified as shown in Figure 216, lane 9. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1230 

A DNA sequence (GBSxl306) was identified in S.agalactiae <SEQ ID 3817> which encodes the amino 
5 acid sequence <SEQ ID 3818>. This protein is predicted to be Vexp3. Analysis of this protein sequence 
reveals the following: 

Possible site: 47 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-14.97 Transmembrane 71 - 87 ( 66 - 97) 
10 INTEGRAL Likelihood = -3.61 Transmembrane 2 - 18 ( 1-18) 

Final Results 

bacterial membrane Certainty=0 . 6986 (Affirmative) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

15 bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 1231 

A DNA sequence (GBSxl307) was identified in S.agalactiae <SEQ ID 3819> which encodes the amino 
acid sequence <SEQ ID 3820>. This protein is predicted to be Vexp3. Analysis of this protein sequence 
reveals the following: 

25 Possible site: 45 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1986 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1232 

A DNA sequence (GBSxl308) was identified in S.agalactiae <SEQ ID 3821> which encodes the amino 
acid sequence <SEQ ID 3822>. This protein is predicted to be Vexp3. Analysis of this protein sequence 
40 reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood - -6.05 Transmembrane 22 - 38 ( 17 - 39) 

45 Final Results 

bacterial membrane --- Certainty=0 . 3421 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD47594 GB:AF140784 Vexp3 [Streptococcus pneumoniae] 
Identities = 39/153 (25%) , Positives = 67/153 (43%) , Gaps = 9/153 (5%) 

Query: 3 LPKRSFLYVSRKKRKSITLFVCLWLVASTLISGIAVKNAGLTA-KKTFSRQTGSILHISS 61 

+ +F YV+RK KSI +F+ + L+AS + G+++K A A ++TF T S + 
Sbjct: 1 MLHNAFAYOTRKFFKSIVIFLIILLMASLSLVGLSIKGATAKASQETFKNITNS-FSMQI 59 

Query: 62 DSTDLVGDGYGSGEIPEKAIVNIASNPNvT01VNNNLMAYAGLTSEK>IVTRPNDKEQYKE- 120 

+ G G+G I + I I M ++ + A LT ++ P K+ 

Sbjct: 60 NRRVNQGTPRGAGNIKGEDIKKITENKAIESYUKRINAIGDLTGYDLIETPETKKNLTAD 119 

Query: 121 QVLQVHGNSYSDTDPKYTAGMISLKGG 147 

L+G+S+K++G L G 
Sbjct: 120 RAKRFGSSLMITGVNDSSKEDKFVSGSYKLVEG 152 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1233 

A DNA sequence (GBSxl309) was identified in S.agalactiae <SEQ ID 3823> which encodes the amino 
acid sequence <SEQ ID 3824>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-15.76 Transmembrane 295 - 311 ( 287 - 317) 
INTEGRAL Likelihood = -7.59 Transmembrane 49 - 65 ( 46 - 69) 
INTEGRAL Likelihood = -6.90 Transmembrane 340 - 356 ( 339 - 362) 
INTEGRAL Likelihood = -5.57 Transmembrane 411 - 427 ( 404 - 430) 

Final Results 

bacterial membrane Certainty=0. 73 05 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Cernainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9695> which encodes amino acid sequence <SEQ ID 9696> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12182 GB:Z99106 similar to transporter [Bacillus subtilis] 
Identities = 95/370 (25%) , Positives = 167/370 (44%) , Gaps = 41/370 (11%) 

Query: 109 ESVEASLSIDVGSRLKSVSPYNSS KEENQVTIAGYQSTEDLRAFQTKALVLK 160 

+++E+S S D S S + NS + +++ G ST + F + 

Sbjct: 115 DAIESSSSSDSSSSSSSSNAKNSQGGGQGGPQMVQADLSIEGVISTALVDDFSDGDSKIT 174 

Query: 161 KGSHLAADNT- -KQVLVPLKLAQKNHLSVGNKLRLGK ENVT IAGIYDANSA- - 209 

G + + K ++ LA++N LSVG+ + + E+ T I GIY S+ 

Sbjct: 175 DGRAITKSDVGKKVTVINETLAEENDLSVGDSiTIESATDEDTTVKLKIVGIYKTTSSGD 234 

Query: 210 -KSKNTFNPNIDNTLIAQATLVRKISKQKGYQTV AVRLSDKRLVDTVIQNIKQWPLD 265 

+++N NNL T ■+ T+ +D + +DT ++ K+ +D 

Sbjct: 235 DQAQNFSFLNPYNKLYTPYTATAALKGDDYKNTIDSAVYYITODAKNMDTFVKAAKKTSID 294 

Query: 266 FGKLDVQTAKEFYGDSYRNIETLHRLVGRIILIVSLVAMAILVVMLTFWINNRIKETGIL 325 

F + T + Y IE + ++ +VS+ IL +++ IRE G+L 

Sbjct: 295 FDTYTIiNTNDQLYQQMVGPIEtWASFSK^A/YLYSVAGAVILGLIVI^SIRERKYEMGvL 354 
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Query: 326 LAIGKTKFEI IGHYLIEVLLVAGAAFTLS I IGGVFLGKTFAAGLLSQV 373 

+AIG+ ++++IG +L E+L+VA A L+ + G + LLSQ 
Sbjct: 355 MAIGEKRWKLIGQFLTEILIVAVIAIGLAS\TGI^LVANQLGKQLLSQQISSSTDSTQTAS 414 

Query: 374 NGGVSSQIVQNSSLI IDRIDNLAVSVGVMDVFRLYAQGALI CLFAWLSSYS IL 427 

GG+ ++ +SS +D ID+L V+V + D+ L G LI + A +L S S+L 
Sbjct: 415 GQMPGGGGGMGGKMFGHSSSimiViaSLKVAVSt^NDMLILGGIGIIjIAIIATLLPSISVL 474 

Query: 428 KLQPKQILSR 437 

+L PK IL++ 
Sbjct: 475 RLHPKTILTK 484 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8757> and protein <SEQ ID 8758> were also identified. Analysis of tl 
15 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: 1.50 
GvH: Signal Score (-7.5): -8.43 
Possible site: 39 
20 >>> Seems to have an uncleavable N-term signal seq 

ALOM program count: 4 value: -15.76 threshold: 0.0 



Likelihood =-: 
Likelihood = • 
Likelihood = • 
Likelihood = • 
peripheral Likelihood = 
modified ALOM score: 3.65 



Transmembrane 



295 - 311 ( 287 - 317) 

49 - 65 ( 46 - 69) 

340 - 356 ( 339 - 352) 

411 - 427 ( 404 - 430) 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 7305 (Affirmative 

• Certainty=0. 0000 (Not Clear) • 
■ Certainty=0. 0000 (Not Clear) • 



The protein has homology with the following sequences in the databases: 

ORF00687(421 - 1611 of 1917) 

EGAD | 108957 1BS0375 (11 - 484 of 486) hypothetical protein {Bacillus subtilis} 
OMNl|NT01BS0429 membrane transport protein GP| 1805444 J dbj | BAA09006 . 1 1 j D50453 homologue of 
hypothetical protein in a raparaycin synthesis gene cluster of Streptomyces hygroscopicus 
{Bacillus subtilis} GP| 2632675 | emb | CAB12182 . l| | Z99106 similar to transporter {Bacillus 
subtilis} PIR|F69762|F69762 transporter horaolog yell - Bacillus subtilis 
%Match =8.6 

%Identity =28.7 %Similarity =52.2 

Matches = 117 Mismatches = 184 Conservative Sub.s = 96 



372 



492 



522 



vL*NH*LIDNVEVDREYLTTSIVILEIIKIEKGGKIVNLWTLSMYLKRQKMKTVTLFLVFLTIGTCLISLMSIQHSLEK 
:| :| ||: ::|| | ::| ::|| : :| 
MNFIKRAFWNMKAKKGKTLLQLFVFTVICVFVLSGLAIQSAAQK 



N ILTKQGKS IYLTSKEKAYWPEQAYEALKK- - 

: I 1= I = =1 1= 



KEENQVTLAGYQSTEDLRAFQTKAL^/LKKGSHIA-ADNTKQV-LVPLKLAQKNHLSVG 

I I : II : ::: 1 |l = I : | : :| |:| :: ||::| MM 

SSSSSSSSNAKNSQGGGC^GPQIWQADLSIEGVISTALVTIDFSDGDSKITDGRAITKSDVGKKVW 
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885 903 954 978 1008 1065 

NKLRL- - -GKENVTI AGIYDANSA KSKNTFNPNIDNTLIA- - QATLVRKI SKQKGYQTVAVR- LSDKRLVDTV 

I: h III |: s::| III I I I I || : | : :|| 

DSITIESATDEDTTOKLKIVGIYKTTSSGDDQAQNFSFLNPYKKLYTPYTATAALKGDDYIOm'IDSAVYIfMDDAKNMDTF 
220 230 240 250 260 270 280 

1095 1125 1155 1185 1215 1245 1275 1305 

IQNIKQWPLDFGK1DVQTAKEFYGDSYRWIETLHRLVGRIILIVSLVAMAILVVMLTFWINNRIKETGILIAIGKTKFEI 
:: h =11 : I s:| || s : :: :||: || ::: 111 |s|:|||: 

VKAAKKTSIDFDTYTIjNTMQLYQQMVGPIENVASFSKKWYLV^ 

300 310 320 330 340 350 360 

1335 1365 1395 1431 1461 1491 

IGHYLIE VLLVAGAAFTLS I IGGVFLGKTFAAGLLSQV NGGVSSQIVQNSSLI IDRIDNIAV 

l|::| 1=1=11 I I: = I == = Mil 11= == =11 =1 11=1 I 

IGQFLTEILIVAVIAIGIASWGNLVANQLGNQLLSQQISSSTDSTQTASGQMPGGGGGMGGKMFGHSSSNVDVIDSLNV 
380 390 400 410 420 430 440 

1521 1551 1581 1611 1641 1671 1701 1731 

SVGVMDVFRLYAQGALICLFAVVLSSYSILKLQPKQILSRMS*EVNMNLFKRSFLYVSRKKRKSITLFVCLWLVASTLIS 
= I = I = = I I II = I =1 I 1=1=1=11 ll== 
AVSMNDMLILGGIGILIAIIATLLPSISVLRLHPKTILTKQE 
460 470 480 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1234 

A DNA sequence (GBSxBlO) was identified in S.agalactiae <SEQ ID 3825> which encodes the amino 
acid sequence <SEQ ID 3826>. Analysis of this protein sequence reveals the following: 

vf-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11993 GB:Z99105 ybdG [Bacillus subtilis] 
Identities = 66/224 (29%) , Positives = 102/224 (45%) , Gaps = 22/224 (9%) 



Query: 


84 


Sbjct: 


41 


Query: 


144 


Sbjct: 


101 




204 


Sbjct: 


161 


Query: 


242 


Sbjct: 


221 



!TP GMEGGKQVDF AAPVLKELPKIPKVSDDIN 241 

P G++ K F +A E+ + ++D+ 

Sbjct: 161 IIAITOIGLPQQYVTYKLSGVDRLKVRGFHLLTSIC-FHRF1PSAVYNPEVIRQSFLTDEEK 220 



t- AI K N++M+ E 



No corresponding DNA sequence was identified in S.pyogenes. 
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SEQ ID 3826 (GBS121) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 24 (lane 9; MW 40kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 31 (lane 6; MW 65kDa). 

GBS121-GST was purified as shown in Figure 198, lane 6. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1235 

A DNA sequence (GBSxBll) was identified in S.agalactiae <SEQ ID 3827> which encodes the amino 
acid sequence <SEQ ID 3828>. Analysis of this protein sequence reveals the following: 

10 Possible site: 33 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty^O. 3000 (Affirmative) < suco 

15 bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm -— Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8759> which encodes amino acid sequence <SEQ ID 8760> 

was also identified. Analysis of this protein sequence reveals the following: 

20 Lipop: Possible site.- -1 Crend: 8 

McG: Discrim Score: 3.70 
GvH: Signal Score (-7.5): -0.0600004 

Possible site: 22 
>» Seems to have a cleavable N-term signal seq. 
25 ALOM program count: 0 value: 8.01 threshold: 0.0 

PERIPHERAL Likelihood = 8.01 167 
modified ALOM score: -2.10 

*** Reasoning Step: 3 

30 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) <l suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) 

35 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8760 (GBS60) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 16 (lane 7; MW 38.6kDa). 

GBS60-His was purified as shown in Figure 193, lane 3. 
40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1236 

A DNA sequence (GBSxl312) was identified in S.agalactiae <SEQ ID 3829> which encodes the amino 
acid sequence <SEQ ID 3830>. This protein is predicted to be unnamed protein product. Analysis of this 
45 protein sequence reveals the following: 

Possible site: 21 

»> May be a lipoprotein 
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Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 

A related GBS nucleic acid sequence <SEQ ID 9693> which encodes amino acid sequence <SEQ ID 9694> 
was also identified. 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8761> and protein <SEQ ID 8762> were also identified. Analysis of this 
10 protein sequence reveals the following: 

Lipop: Possible site: 19 Crenel: 5 
McG: Discrim Score: 9.85 
GvH: Signal Score (-7.5): -0.28 
Possible site: 21 
15 »> May be a lipoprotein 

ALOM program count: 0 value: 9.07 threshold: 0.0 
PERIPHERAL Likelihood =9.07 99 
modified ALOM score: -2.31 

20 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 
37.0/57.2% over 118aa 

Bacillus subtilis 

EGAD | 108627 | hypothetical protein Insert characterized 
GP|2632485|emb|CAB11993.l| | Z99105 ybdG Insert characterized 
30 PIR|D69747|D69747 hypothetical protein ybdG - Insert characterized 

ORF00608(553 - 906 of 1416) 

EGAD | 108627 | BS0200 (51 - 169 of 296) hypothetical protein {Bacillus subtilis} 
GP|2632485|emb|CAB11993 .l| | Z99105 ybdG {Bacillus subtilis} PIR | D69747 1 D69747 hypothetical 
35 protein ybdG - Bacillus subtilis 

%Match =8.7 

%Identity =37.0 %Similarity =57.1 

Matches = 44 Mismatches = 50 Conservative Sub.s = 24 

40 339 369 399 429 459 489 519 549 

ITKLSTVALSLLLCTACAASNTSTSKTQSHHPKQTKLTDKQKEEPKNKEAADQEMHPQGAvDLTKYKAKPVKDYGKKIDV 

MKTLWKVLKIVFVSLAALVLLVSVSVFIYHHFQLNKEAALLKGKGTVVD 
10 20 30 40 

45 

579 609 639 669 639 729 759 789 

GDGKKMNIYETGQGKIPIVFIPGQAEISPRYAYKNLIERLSKKyKIYTVEPLGyGLSDIPTKPRTLENITKEIHTGLNKI 

111111 = 1= I II Ih I =1 I I I = = ll= II 1= III l = = I II 

VDGKKMNVYQEGSGF^TFVFMSGSGIAAPAYEMKGLYSKFSKENKIAVVDRAGYGYSEVSHDDRDIDIVLEQTRKALMKS 
50 60 70 80 90 100 110 120 

816 846 876 906 936 966 996 1026 

GVTQJFY-IAAHSLGGMYSLNYAK1WPEETOGFIGMDTSTPVMEGEQ 
II II lb 1= := =1= 11=1== I II I I 
55 GNKPPYILMPHSISGIEAMYWAQKYPKEIKAIIAMDIGLPQQYVTYKLSGVDRLKVRGFHLLTSIGFHRFIPSAVYNPEV 
140 150 160 170 180 190 200 

SEQ ID 8762 (GBS21) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 1 (lane 3; MW 31.6kDa). 



60 GBS21-His was purified as shown in Figure 192, lane 11. 
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GBS21L was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 124 (lane 8-10; MW 66.5kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 124 (lane 1 1; MW 41.5kDa) and in Figure 180 
(lane 6; MW 41kDa). GBS21L-His was purified as shown in Figure 232 (lanes 3 & 4) 

5 Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1237 

A DNA sequence (GBSxl313) was identified in S.agalactiae <SEQ ID 3831> which encodes the amino 
acid sequence <SEQ ID 3832>. This protein is predicted to be endopeptidase O. Analysis of this protein 
1 0 sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm --- Certainty=0. 3854 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:AAF67832 GB:AF179267 endopeptidase Pep02 [Lactococcus lactis] 

Identities = 21/36 (58%) , Positives = 26/36 (71%) 

Query: 1 MRANIPVRNFQEFVDAFGVKKGDSMYLKPEKRLTLW 36 
+RANIP N +EFY+ F VK+ D MY PEKRL +W 
25 Sbjct: 592 LRANIPPTNLEEFYETFDVKETDQMYRAPEKRLKIW 627 

There is also some homology to SEQ ID 2384: 

Identities = 13/36 (36%) , Positives = 25/36 (69%) 

30 Query: 1 MRANIFVRNFQEFYDAFGVKKGDSMYLKPEKRLTLW 36 

+R N+ + NF F++ F +K+GD+M+ P+ R+ +W 
Sbjct: 596 LRTNVTLTNFDAFHETFDI KEGDAMWRAPKDRVI IW 631 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 1238 

A DNA sequence (GBSxl314) was identified in S.agalactiae <SEQ ID 3833> which encodes the amino 
acid sequence <SEQ ID 3834>. This protein is predicted to be endopeptidase O. Analysis of this protein 
sequence reveals the following: 

40 Possible site: 47 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .3801 (Affirmative) < suco 
45 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAR16168 GB:L18760 endopeptidase [Lactococcus lactis] 
50 Identities = 118/268 (44%) , Positives = 174/268 (64%) , Gaps = 6/268 (2%) 
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Query: 




Sb 3 ct: 




Query: 




Sbjct: 




Query: 


121 


Sbjct: 


440 


Query: 


181 


Sbjct: 


500 


Query: 


241 


Sbjct: 





J M A+ VNAY P 



- +++K + MI +DG++ + G +GKL J-+ENIAD GG+ A+L A K EK +K 



Query: 241 NFLNHGQVFGVKKQPKNKVSPQFSQMFK 268 
F+ K+KS+FQM + 

3 AFFSQW AKIWRMKASKEFQQMLL 582 

There is also homology to SEQ ID 2384: 

Identities = 110/253 (43%) , Positives = 161/ 



Query: 


1 




324 


Query: 


61 


Sbjct: 


384 




121 


Sbjct: 


444 


Query: 


181 


Sbjct: 


504 




241 


Sbjct: 


564 



++I VYK+RL+ WL+ T+ AI KL+ H 



N K++ T+ ++N+ R W M A+ VNAY D 



* IVFPAAI Q P Y ++ S NYG IGA+I HEISH+FD NG +DE G+L+DWWT 



+ED +K++T ++ Q+DGL++ G KV+GKLT++EN+AD GGV +LEA ++E+ 



45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1239 

A DNA sequence (GBSxl315) was identified in S.agalactiae <SEQ ID 3835> which encodes the amino 

acid sequence <SEQ ID 3836>. Analysis of this protein sequence reveals the following: 

50 Possible site: 39 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

55 bacterial membrane Certainty=D. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9691> which encodes amino acid sequence <SEQ ID 9692> 
was also identified. 

60 The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAC35997 GB:AF019410 endopeptidase O [Lactobacillus helveticus] 
Identities = 85/315 (26%) , Positives = 146/315 (45%) , Gaps = 8/315 (2%) 



Query: 46 NVSPRENLYRAVNinmLMIT^ 104 

N P++NLY AVN WL+ ++ QTS +E++ K+++ ++ D A +ASGK + + 
Sbjct: 20 NAKPQDNLYLAVNSBWLSKAEIPADQTSAGVNTEDDIKIEKRMMKDFADIASGKEKMPDI 79 

Query: 105 DEQKJOWAYYKQGMDFKTRDKHGLKPLKPVLQKLEAVSSMKDFQSrAHDFVMSGFVLPFG 164 

+ K +A YK +F RD P++ LQK+ + + F+ A + M + LPF 

Sbjct: 80 RDFDKAIALYKIAKWFDKRDAEKANPIQNDLQKILDLlNFDKFKDNATELFMGPYAIiPFV 139 

Query: 165 LTVETNARDNSQKQLVLRQAPALLESPDQYKKGNKEGEAKLSAYRTSAMALLKQAGKSMI 224 

V+ + ++ L L YK E + L ++ LL+ AG 

Sbjct: 140 FDVDADMKNTDFNVLHFGGPSTFLPDTTTYK--TPEAKKLLDILEKQS1NLLEMAGIGKE 197 

Query: 225 EDRKLVKQAIAFDRLLSEKTQVDQSKITAESETARGRYNPESMETVHNYAKEFDFKELIE 284 

E R V+ A+AFD+ LS+ K T E A YNP S+ K FD 4- ++ 

Sbjct: 198 EARVYVQNALAFDQKLSKW KSTEEWSDYAAIYNPVSLTEFLAKFKSFDMADFLK 252 

Query: 285 KLVGPTNKAVNVEDKTYFKQA/NDVINSKQIA1^KAW1»MISMLVDQSDFLGEQNRQAASAF 344 

++ + v V + + +++IN +K WM++ + + +L + R AA F 

Sbjct: 253 TILPEKAffiRVIVMEPRFLDHADELINPANFDEIKGWMLVKYINSVAKYLSQDFRAAAFPF 312 

Query: 345 KNVASGLTQIESKEK 359 

SG ++ S+ K 
Sbjct: 313 NQAISGTPELPSQIK 327 

A related GBS gene <SEQ ID 8763> and protein <SEQ ID 8764> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 5.41 
GvH: Signal Score (-7.5): -1.39 

Possible site: 36 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 2.76 threshold: 0.0 
PERIPHERAL Likelihood = 2.76 151 
modified ALOM score: -1.05 

*** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8764 (GBS 12) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 (lane 7; MW 65kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 3 (lane 3; MW 39kDa). 

The GST-fusion protein was purified as shown in Figure 189, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1240 

A DNA sequence (GBSxl317) was identified in S.agalactiae <SEQ ID 3839> which encodes the amino 
acid sequence <SEQ ID 3840>. Analysis of this protein sequence reveals the following: 

55 Possible site: 15 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.75 Transmembrane 301 - 317 ( 299 - 317) 
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Final Results 

bacterial membrane Certainty^O. 1702 (Affirmative) < suco 

bacterial outside Certainty=0 . 0 0 0 0 (Mot Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB42180 GB:A67181 unnamed protein product [unidentified] 
Identities = 245/771 (31%) , Positives = 410/771 (52%) , Gaps = 80/771 (10%) 

Query: 22 WVIVEFNKESILDYATEQKKIVAQLNQADVEKXLQSIKQEQDICVLKNIEKSVHFDSSKV 81 

VRVIV NK + D+ ++ + A + + +E+ +K Q+KV+K +E+ +KV 
Sbjct: 97 VRVIVSLNKSAAFDHTSKPTGSAASVKK- -ZEQASDQVKDGQEKVIKQVEE ITGNKV 151 

Query: 82 KR-YDAIINGVALDIQAQEIEKLKTIADVRRvYVSQEYVQTKPLLSSSGQLIGLPEVWNN 140 

+R + ++N ++D+ +I+K+K + V+ V + Y P S+ Q+ + +VW 
Sbjct: 152 RRQFGYLVNAFSIDMDIiDDIDKVKDLPQVKNVTPVKVY HPTDESADQMAQVQDVWQE 208 

Query: 141 SQYKGEGTVVAVIDSGVDFKHQALKIKEPNRAKYNKTSIE KLIHEKNLKGKFYSEK 196 

+ KGEG V+++ID+G+D HQ LK4 +K+ +E KL H GK+Y+EK 

Query: 197 VPYGYNYYDYNDNLKDS-YGVMHGMHOTGIVGA>TODNQKLYGvAPmQIIAMKVFSDDQQ 255 

VPYGYNY D ND + D+ G MHG HV GI GAN ++ GVAP+AQ+LAMKVFS++ + 
Sbjct: 264 VPYGYNYADKNDQIVDNGCGEMHGQHVAGIAGANG QVKGVAPDAQLLAMKVFSNNAK 320 

Query: 256 NPTTFTDVWLKALDDAILLKADWNMSLGTPAGFVHEGKDYPELEVIARACKAGIVIAVA 315 

N + D + A++D++ L ADV+NMSLG+ + V G P+ + +A+A +AG++ ++ 
Sbjct: 321 NSGAYDDDIISAIEDSVKLGADVINMSLGSVSSDV--GPSDPQQQAVAKASEAGVINVIS 378 

Query: 316 AGNE GNITDGNTYGVKPLAENYDTALIANPALDDOTIAVASMENLKK3JAHVLKFK-- 370 

AGN G+ DGN +E + + P + + Ii VAS EN K +K + 

Sbjct: 379 AGNSGVAGS TADGNP VHNTGTSE LSTVGrPGVTPDALTVASAENSKVTTDTVKDELG 435 

Query: 371 - DKKSGTEVTE VINIJIVAPNASKTI IGLAVDLGAGAPSELS - - KHFDLSGKI A 420 

+ K +VT + ++ K + VX1+G G + + K ++ G++A 
Sbjct: 436 GVTFSSNSELKGAAQVTTQLESNYSVLTKKLKL VDMGLGGADDYTAEKKAEVKGQLA 492 

Query: 421 MLEIPEDNKSNGFLEKVQAITKLNPAAILLYNNAKVKDDLGSQLLVESEAAKFMIARITR 480 

+++ + F KV A I++YN+ D L S L + +++ 

Sbjct: 493 WK RGAYTFSAKVANAi®AGAAGIVIYNSE--DDGLLSMSLDDKTFPTLGMSKADG 546 

Query: 481 STY NNIKNNSNKIITILTERQAIDNSLAGQLSSYSSWGPTPDLRLKPEITAPGGHI 536 

+ ++ + K T L IDNS AG++S ++SWGPTP+L KPEITAPGG I 

Sbjct: 547 KFWLKQQKKVRASRLKFGTAL IDNSRAGKMSDFTSWGPTPELDFKPEITAPGGKI 601 

Query: 537 FSTVEDNQYADKSGTSMAAPQVAGAAAVLKQYITDKKIPV--DNAADFIKLLLMNTAQPI 594 

+S DN+Y SGTSMA+P VAG+ A++ QI+ + ++ FK MNT+ P+ 
Sbjct: 602 YSLANDNKYQQMSGTSMAS PFVAGSEAL I LQG I KKQGLNLSGEELVQFAKNSAMNTSHPV 661 

Query: 595 IN-KQSKDGKTPYFVRQQGSGAMNLAKALVTTWATWGTNDNNADGKLELREL-KEKKF 652 

+ + +K+ +P R+QGSG +N+ A+ TV N +G L+E+ ++ F 

Sbjct: 662 YDTEHTKEIISP---RRCGSGEINVKDAINNTVEVKAA NGNGAAALKEIGRQTTF 713 

Query: 653 KARILLRNFGKTNKTYIISSEA — IADPVDEKGFRTQNSEHLVSKKADAVTRKVTVEAGK 710 

K + L N GK +TY + + + K +++ +V + T KVTV+ G+ 

Sbjct: 714 K- - VTLTNHGKKAQTYAVDNYGGPYTQATEAKSGEIYDTK- 1 VKGQLTTETPKVTVQPGE 770 

Query: 711 TIAVDLDVDYSDAEALTRNNFLEGYLNLK-DTEGVADLHLPFLGFYGSWTE 760 

+VD+ + + R NF+EGY+ + + +L LP++GF+GS+++ 
Sbjct: 771 --SVDVSFTLTLPYSFQRQNFVEGYVGFEAKDQATPNLVLPYMGFFGSYSQ 819 

A related GBS gene <SEQ ID 8767> and protein <SEQ ID §768> were also identified. Analysis of tins 
protein sequence reveals the following: 

1 Crend: 10 
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GvH: Signal Score (-7.5): -6.06 

Possible site: 15 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -1.75 threshold: 0.0 

INTEGRAL Likelihood = -1.75 Transmembrane 301 - 317 ( 299 - 317) 
PERIPHERAL Likelihood = 1.75 614 
modified ALOM score: 0.85 



Final Results 

bacterial membrane Certainty=0 . 1702 (Af Eirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00677(358 - 3159 of 3255) 

EGAD|l39899|l49200(95 - 1541 of 1946) prtB protein {Lactobacillus delbrueckii} 
GP|l3B1114|gb|AAC41529.l| |L48487 proteinase precursor {Lactobacillus delbrueckii} 
PIR| JC6032| JC6032 lactocepin (EC 3.4.21.96) precursor [similarity] - Lactobacillus 
delbrueckii subsp. bulgaricus 
%Match =15.5 

% Identity =33.3 %Similarity =54.6 

Matches = 275 Mismatches = 343 Conservative Sub.s = 176 

318 348 378 408 438 468 498 528 

KAVTVTKPQGAVAEKATPAVPKPQKOTVIVEFNKESILDYATEQKKTO 

I I I I I = 11 : :h :: : I = =1= =1 1 = 11 = 1 =1 = 

SKFQEAAKEQRQASGQAVSKKNESSVRVIVSLNKSAAFDHTSKPTGS7AASV- -KKIEQASDQVKDGQEKVIKQVEE-- - 1 



DSSKVKR-YDAI INGVALDICAQEIEKLKTIADVT^VYVSQEYVQTKPLLSSSGQLIGLPEVWNNSQYKGEGTWAVIDS 

:||:| = = = l : = |: =1 = 1 = 1 = 1= I =11 1= 1= = =11 = Mil l = = = ll = 

TGNK^/RRQFGYLWAFSIDMDLDDIDKVKDIiPQVKNVTPVKVYHPT DESADQMAQVQDVWQEQKLKGEGMVISIIDT 



GVDFKHQALKIKEPNRAKYNKTSIEKLIHEKNLKGKFYSEKA/PYGYN^ 

hi ii n= =i= =i i ihhimiim i ii = i= i in ii ii in = 

GIDSSHQDLKLDSGVSTALSKSEVESDKS-KLGHGKYYTEKVPYGYNYADKNDQIVDNGCGEMHGQHVAGIAGAN---GQ 
240 250 260 270 280 290 

1032 1062 1092 1122 1152 1182 1212 1242 

LYGmPNAQIIiAMKVFSDDQQNPTTFTDVWLKALDDAILLKADVVNI»ISLGTPAGFvHEGKDYPELEVIARACKAGIVIAV 

: llll = ll = llllllh= =1=1= l = =l = = I 111 = 11111= =11 1= = =1 = 1 =ll = = = 
WGVAPDAQLLAMKVFSNNAKNSGAYDDDIISAIEDSVKLGADVINt>ISLGSVSSDV--GPSDPQQQAVAKASEAGVINVI 
310 320 330 340 350 360 370 

1272 1302 1326 1356 1386 1415 1656 

AAGNEGNITDGNTYGVECPLAENYDTAL- - IANPALDDNTLRVASMENLKKHAHVLKFKDKKSGTEVTEV AAILLYN 

=111 I 1=1 1= = I = I = = I III II I 

SAGNSG--VAGSTADGNPVNNTGTSELSTVGTPGVTPDALTVASAENSK 

390 400 410 420 

1686 1715 1746 1776 1806 

NAKVKDDLGSQLLVESEAAKFNIARITRSTYNNIia«T3NKIITILTERQA 

I == I = =1 1= = =1 = == I I = 

VTTI^reKDELGGVTFSSNSELKG-AAQVTTQLESNYSVLTKKLKLvDMGLGGADDYT FWLKQQ 

430 440 450 460 470 480 

1824 1854 1884 1914 1944 1974 2004 

IDNSLAGQLSSYSSWGPTPDLRLKPEITAPGGHIFSTVEDNQYADKSGTSMAAPQVAGAAAVLKQY 

llll ll==l ==111111=1 =111111111 1=1 11=1 111111=1 III: |s: I 
KKVRASRLKFGTALIDNSRAGKMSDFTSWGPTPELDFKPEITAPGGKIYSLANDNKYQQMSGTSMASPFVAGSEALILQG 
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2058 2088 2115 2145 2175 2205 2235 

ITDKKIPV--DNAADFIKLLLIOTAQPII^^ 

| : : : = | | ||::|: : : :|: :| |:|||| :|: Ml I II 1= 

IKKQGLNLSGEELVQFAKNSAMNTSHPVYDTEHTKEI ISP- - -RRQGSGEINVKDAIWNTV- -EVKAANGNGA- - -AALK 
5 S50 SSO 670 680 690 700 

2265 2295 2349 2379 2409 2439 2469 

ELKEKKFKARILLRNFGKTNKTyiISSEA--IADPVDEKGFRTQNSEHLVSKKADAVTRKVTVEAGKTLAVDLDVDYSDA 
|= := | | || ,|| : : : | ::= :| : | ||||, |=: ||: : 

1 0 EIG-RQTTFKVTLTNHGKKAQTYAVDNYGGPYTQATEAKSGEIYDTK- I VKGQLTTETPKVTVQPGES- - VTJVSFTLTLP 

720 730 740 750 760 770 780 

2499 2526 255S 2586 2616 2646 2676 2706 

EALTRNNFLEGYIJSrLK-DTEGVADLHLPFLGFYGSWTEQKAIDAFEGISEIGNGDKKREVQFYVNKETNKTSSTFTTNGM 
15 := | 11:111, :: : :| ||:,||,||,: | == | : | || : :, | : : : | 

YSFQRQNFWGWGFFjAKDQATPNLVLPYMGFFGSYS-QASVSA-PMLYEGGNSNLIOT 

800 810 820 830 840 850 

2724 2754 2781 2811 2841 2871 2901 2931 

20 LSLPIYNmVFFSPNSP-FYDKAGVRIAALRNMEWQYSIIDPDTNKEVRVLGRSHDWKIjYRLDYRNSFAMMPDS 

I = = III I I = II = =1 I II 1= II » I I 

EGDDYSKYTDPDLIAISPNGDGSRDYAYPVLFFDRNYKEYTETITDAQGNK-VKSLGVGKEGTKDYYSSSSGEWTTHSLD 
870 880 890 900 910 920 930 

25 2961 2991 3021 3051 3081 3111 

IWDGKIKD*IAKGDKQYIYQIICVQLNNKGVGGDGVQIYQYYIKMDNNKPYLSPICDKTTVEKLEDRWK 

III I I llll ||:: :|| I :|:|» I :| I II : I 

KWDGTDADGQWKDGQYIY- - KVEFT- PAIGGQE - QELNI PVKVDSQAPEVSDLQVTKDGKLRLKAKDSGSGLDMTMFVA 
950 960 970 960 990 1000 1010 

30 

3159 3189 3219 3249 

KITFKVQDT3IGLKDVYLQSVKYVGGGNNNLDLITPPGFKK 

II II 

AVNGEEQ VDGKSWTKLDKDTOQVAENGKV^FKYQDVYGNESKl^TYEVKNIVKEVAAQPELKLTPDGEGKVKAELA 

35 1520 1530 1540 1550 1560 1570 1580 

SEQ ID 8768 (GBS362N) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 149 (lane 10; MW 63.5kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 182 (lane 9; MW 38kDa) and in Figure 
40 149 (lane 1 1 & 12; MW 38kDa). Purified GBS362N is shown in Figure 235, lanes 3 & 4 

GBS362C was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 149 (lane 14-16; MW 91kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 155 (lane 18; MW 66.3kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 1241 

A DNA sequence (GBSxl318) was identified in S.agalactiae <SEQ ID 3841> which encodes the amino 
acid sequence <SEQ ID 3842>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
50 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.04 Transmembrane 21 - 37 ( 17 - 38) 

Final Results 

bacterial membrane Certainty=0. 2614 (Affirmative) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:EAA95000 GB:AB042239 PAa [Streptococcus criceti] 
Identities = 55/166 (33%) , Positives = 81/166 (48%) , Gaps = 24/166 (14%) 

Query: 5 KKTDKFGFRKSKVCRSLCGALLGTVAWSLATAETEIHADEATTSPTTVTKVPQPVQADT 64 

K+ + FGFRKSK+ +SLCGALLGT WS+ A A++ TTS T+ DT 
Sbjct: 2 KRKETFGFRKSKISKSLCGALLGTAIWSV--AGQRALAEDMTTSTTSA TOT 51 

Query: 

Sbjct: 

Query: 121 -DVSQPITTTPPTI NEKTVEI PNLAQDTKKVAPKVTVTPE 159 

VSQ T T+ +EK+ EI D K A + +T E 

Sbjct: 112 VTVSQDETVDKGTVGTSQEADEKSGEI KADYSKQAETIKITTE 154 

No corresponding DNA sequence was identified in S. pyogenes. 

SEQ ID 3842 (GBS222) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 6; MW 22kDa). 

Based on this analysis, it was predicted that tiiis protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1242 

A DNA sequence (GBSxl319) was identified in S.agalactiae <SEQ ID 3843> which encodes the amino 
acid sequence <SEQ ID 3844>. This protein is predicted to be CylK. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3738 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1243 

A DNA sequence (GBSxl320) was identified in S.agalactiae <SEQ ID 3845> which encodes the amino 
acid sequence <SEQ ID 3846>. This protein is predicted to be CylJ. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1143 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9689> which encodes amino acid sequence <SEQ ID 9690> 
was also identified. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1244 

5 A DNA sequence (GBSxl321) was identified in S.agalactiae <SEQ ID 3847> which encodes the amino 
acid sequence <SEQ ID 3848>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

10 , Final Results 

bacterial cytoplasm Certainty=0. 0913 (Affirmative) < suco 

hacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1245 

20 A DNA sequence (GBSxl322) was identified in S.agalactiae <SEQ ID 3849> which encodes the amino 
acid sequence <SEQ ID 3850>. This protein is predicted to be Cyll (fabF). Analysis of this protein 
sequence reveals the following: 

Possible site: 24 

>» Seems to have an uncleavable N-term signal seq 
25 INTEGRAL Likelihood = -2.39 Transmembrane 721 - 737 ( 721 - 738) 

INTEGRAL Likelihood = -1.97 Transmembrane 326 - 342 ( 326 - 343) 
INTEGRAL Likelihood = -0.43 Transmembrane 534 - 550 ( 534 - 550) 

Final Results 

30 bacterial membrane Certainty=0. 1956 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9687> which encodes amino acid sequence <SEQ ID 9688> 
35 was also identified. 

There is also homology to SEQ ID 3852. 

A related GBS gene <SEQ ID 8769> and protein <SEQ ID 8770> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
40 McG: Discrim Score: 1.08 

GvH: Signal Score (-7.5): -5.97 

Possible site: 24 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 3 value: -2.39 threshold: 0.0 
45 INTEGRAL Likelihood = -2.39 Transmembrane 712 - 728 ( 712 - 729) 

INTEGRAL Likelihood = -1.97 Transmembrane 317 - 333 ( 317 - 334) 
PERIPHERAL Likelihood = 3.45 492 
modified ALOM score: 0.98 
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*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 1956 (Affirmative) < suco 

5 bacterial outGide Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

SEQ ID 8770 (GBS361) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 73 (lane 4; MW 84kDa). 

10 GBS361-His was purified as shown in Figure 213, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1246 

A DNA sequence (GBSxl323) was identified in S.agalactiae <SEQ ID 3853> which encodes the amino 
15 acid sequence <SEQ ID 3854>. This protein is predicted to be CylF. Analysis of this protein sequence 
reveals the following: 
Possible site: 44 

»> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 3766 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1247 

A DNA sequence (GBSxl324) was identified in S.agalactiae <SEQ ID 3855> which encodes the amino 
30 acid sequence <SEQ ID 3856>. This protein is predicted to be CylE. Analysis of this protein sequence 
reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 .3498 (Affirmative) < suco 

bacterial membrane — Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

40 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1248 

A DNA sequence (GBSxl325) was identified in S.agalactiae <SEQ ID 3857> which encodes the amino 
45 acid sequence <SEQ ID 3858>. This protein is predicted to be ABC transporter homolog CylB. Analysis of 
this protein sequence reveals the following: 
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Possible site: 56 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13. 

INTEGRAL Likelihood =-10 

INTEGRAL Likelihood = -8. 

INTEGRAL Likelihood = -6 

INTEGRAL Likelihood = -1 



271 - 287 ( 263 - 

Transmembrane 17 - 33 ( 14 - 

Transmembrane 114 - 130 ( 106 - 

Transmembrane 152 - 168 ( 149 - 

Transmembrane 186 - 202 { 185 - 



Final Results 

10 bacterial membrane Certainty=0. 6562 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9685> which encodes amino acid sequence <SEQ ID 9686> 
15 was also identified. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1249 

20 A DNA sequence (GBSxl326) was identified in S.agaJactiae <SEQ ID 3859> which encodes the amino 
acid sequence <SEQ ID 3860>. This protein is predicted to be ABC transporter homolog CylA. Analysis of 
this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .4122 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9683> which encodes amino acid sequence <SEQ ID 9684> 
was also identified. A further related GBS gene <SEQ ID 877 1> and protein <SEQ ID 8772> were also 
identified. Analysis of this protein sequence reveals homology to membrane protein ABC transporters. 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 9085> which encodes the amino 
acid sequence <SEQ ID 9086>. An alignment of the GAS and GBS sequences follows: 



Sbjct: : 

Query: 97 FIELGVGF NPELTGRENVYMNGAtILGFTKD3VDDI«NDIVDFAELHHFMNQ 147 

+ +G+ F + LT EN+ GA+ G +K +V + D+ + ++ Q 

Sbjct: 70 IKDFYRHIGIVFQSNRLDDNLT^/EENLISRGALYGLSKSQVRNRLKDLQTYLDITAIKKQ 129 

Query: 148 KLKNYSSGMQVRLAFSVAI KAQGDVLILDE VLAVGDEAFQRKCNDYFME - RKDSGKTTI L 206 

K + SG+++ +A+ Q +L+LDE D +R D + + S T +L 

Sbjct: 130 KYGSLSGGQKRKVDIARALLPQPSLLLLDEPTTGLDPQSRRDLWDAIAQLNQQSQMTVVL 189 

Query: 207 VTHDMGAVKKYCNRAvLIEDGLVKAYGEPFDVANQYSVDNTETA-EDAMNAEKISVSDIA 265 

+TH + + C+ ++ +G + G+ Q+S N + + +++S++D 

Sbjct: 190 ITHYLEEMSA-CDVLNVLIEGNIYYSGDIKSFIEQHSTrNI^NVVLKPEKSLDQLSIADFV 248 

Query: 266 KDLKVSLISNPRITPNDTITFEVSYEVLKDD 296 
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K ++S I D 1+ E +V+ D+ 
Sbjct: 249 N- - KCQVLSEREIVFKD - 1 SVEEMMQVISDN 276 

There is also homology to SEQ IDs 358, 482, 644, 686, 1832, 2529, 2720, 3882, 4028, 4104, 4280, 5090, 
5 5498, 6034, 6500. 

SEQ ID 8772 (GBS83) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 20 (lane 2; MW 37.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 21 (lane 5; MW 62.6kDa) and in Figure 
28 (lane 3; MW 62.6kDa). 
10 GBS83-GST was purified as shown in Figure 195, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1250 

A DNA sequence (GBSxl327) was identified in S.agalactiae <SEQ ID 3861> which encodes the amino 
15 acid sequence <SEQ ID 3862>. This protein is predicted to he acyl carrier protein homolog AcpC. Analysis 
of this protein sequence reveals the following: 
Possible site: 56 

>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0. 3451 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1251 

A DNA sequence (GBSxl328) was identified in S.agalactiae <SEQ ID 3863> which encodes the amino 
30 acid sequence <SEQ ID 3864>. This protein is predicted to be CylG (fabG). Analysis of this protein 
sequence reveals the following: 

Possible site: 3S 

»> Seems to have no N-terminal signal sequence 

35 ' Final Results 

bacterial cytoplasm Certainty=0. 2651 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

40 There is also homology to SEQ ID 3866. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1252 

A DNA sequence (GBSxl329) was identified in S.agalactiae <SEQ ID 3867> which encodes the amino 
acid sequence <SEQ ID 3868>. This protein is predicted to be CylD. Analysis of this protein sequence 
reveals the following: 

5 Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2030 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 1253 

A DNA sequence (GBSxl330) was identified in S.agalactiae <SEQ ID 3869> which encodes the amino 

acid sequence <SEQ ID 3870>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3219 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 1254 

A DNA sequence (GBSxl331) was identified in S.agalactiae <SEQ ID 3871> which encodes the amino 
acid sequence <SEQ ID 3872>. Analysis of this protein sequence reveals the following: 



Possible site: 56 



40 





have no N-terminal signal sequence 










INTEGRAL 


Likelihood = -8.97 Transmembrane 


231 - 


247 


226 


- 251) 


INTEGRAL 


Likelihood = -7.06 Transmembrane 




157 


134 


- 164) 


INTEGRAL 


Likelihood = -2.76 Transmembrane 


2B 


44 


25 


- 44) 


INTEGRAL 


Likelihood = -1.38 Transmembrane 


123 


139 


121 


- 139) 


INTEGRAL 


Likelihood = -0.32 Transmembrane 


199 


215 


( 199 


- 215) 



Final Results 

bacterial membrane Certainty=0 . 45B8 (Affirmative) < succi 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB88836 GB:AL353832 putative integral membrane transport 
protein. [Streptomyces coelicolor A3 (2) ] 
50 Identities = 68/264 (25%) , Positives = 123/264 (45%) , Gaps = 10/264 (3%) 
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Query: 6 RMHFI FIKQYMKQIMEYKIDFFVGVLGVFLTQGLNLIjFLNVLFQHI PSLEGWTFQQIAFI 65 
R + + +++ M Y+ F + G F L + + + ++F + +L G++ ++AF+ 
RAYGLIAGWIRSTMAYRTSFALTAFGNFAMTALDFVAILLMFSRVDALGGYSLPEVAFL 93 



Query- 66 YGFSLLPKGIDHLFFDNLWALGQRLIRKGEFDKYLTRPISPLFHVLVETFQVnALGEDLV 125 

YG S + G+ L ++ LG+R +R G D L RP L V + F + LG ++ 
Sbjct: 94 YGLSGVSFGrADLAIGSMERLGRR-VRDGTLDTLLVRPAPVIAQVARDRFALRRLGRWQ 152 

Query: 126 GFIBL-STTVSSISWTVPKVI.I,FIFIIPFATLIYTSLKIATSSIAFWTKQSGAVIYIF- 182 

G ++L + V I WT KVLL + 1+ ++ +A + F + + V F 

Sbjct: 153 GLLVLGYALVVVDIDWTAAKVLLLPVALISGAGIFCAVFVAAGAFQFAAQDASEVANAFT 212 

Query 183 YMFKDFAKYP\7AIYNNLLRWIISFVIPFAFTAYYPAAYFLQDRNVYFNIGGVI LI 237 

Y +yp ++ L +FV+P AF + PA+Y L R ++ G + L 

Sbjct: 213 YGGTTMLQYPPTVFALDLWGATFVLPLAFVNWLPASYVL-GRPYPLDLPGWVAFTPPIA 271 

Query: 238 SLISFMVSLILWHKGVEVYESAGS 261 

+ ++ + W G+ Y S GS 

Sbjct: 272 AAACCAIAGLAWRAGLRSYRSTGS 295 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3873> which encodes the amino i 
sequence <SEQ ID 3874>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.86 Transmembrane 227 - 243 ( 225 - 251 

INTEGRAL Likelihood = -7.22 Transmembrane 141 - 157 ( 133 - 164 

INTEGRAL Likelihood = -6.37 Transmembrane 123 - 139 ( 114 - 140 

integral Likelihood = -2.97 Transmembrane 26 - 42 ( 26 - 49) 

Final Results , , , 

bacterial membrane — Certainty=0. 4545 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB88836 GB:AL353832 putative integral membrane transport 
protein. [Streptomyces coelicolor A3 (2) ] 
Identities = 69/262 (26%) , Positives = 125/262 (47%) , Gaps = 10/262 (3%) 

IKQYLKQIMEYKVDFWGVLGVFLTQGLNLLFLSVLFQHIPSLEGWTFEQIAFIYG 67 
+ + +++ M Y+ F + G F L+ + + ++F + +L G++ ++AF+YG 
YGLIAGMWIRSTMAYRTSFALTAFGNFAMTALDFVAILLMFSRVDALGGYSLPEVAFLYG 95 

Query 68 FCLIPKGIDHLFFDNLWALGQRLWKGEFDKYLTRPISPLFHVLVETFQVDALGELLVGV 127 

+ G+ L ++ LG+R VR G D L RP L V + F + LG ++ G+ 
Sbjct: 96 LSGVSFG^^IGSMERLGRR-VRTOTIJJTLLVRPAPVLAQVAADRFALRRLGRVVQGL 154 

Query 128 ILL--VTTAGSIVWTLPKvLLFILVIPFATLIYTSLKIATASISFWTKQSGAVIYIF-YM 184 

++L I WT KVLL + + I++++A + F+ +V FY 

Sbjct: 155 LVLGYALWVDIDWTAAKVLLLPVALISGAGIFCAVFVAAGAFQFAAQDASEVANAFTYG 214 

Query 185 FNDFSKYPMSIYHSFLRWLISFIIPFAFTAYYPASYFLTGQHLLFHIGGLV WSL 239 

+Y p +++ L +F++P AF + PASY L G+ ++ G V + + 

Sbjct: 215 GTTMLQYPPOTFALDLWGATFVLPIAFVNWLPASYVL-GRPYPLDLPGWAFTPPLA^ 273 

Query: 240 LVLALSLKLWKWGLDAYESAGS 261 

AL+ W+ GL +Y S GS 
Sbjct: 274 ACCALAGLAWRAGLRSYRSTGS 295 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 208/261 (79%) , Positives = 238/261 (90%) 
Query. 1 MTKYQjUffiFIFIKQYMKQIMEY^ 60 



Query: 8 HAIFIL._ 

M Y+ F 

rjct: 36 
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M K + MH IFIKQY+KQIMEYK+DF VGVLGVFLTQ3LNLLFL+VLFQHIPSLEGWTF+ 
MAiajRCMHAIFIKQYLKQIMEYKOTEVVGVIfiWLTC^IJSn^FLSVLFQHIPSLEGWTFE SO 



QIAFIYGF L+PKGIDHLFFDNLWALGQRL+RKGEFDKYLTRPISPLFHVLVETFQTOAL 



GELLVG ILL TT SI WT+PKVLLFI + IPFATLIYTSLKIAT+SI+FWTKQSGAVIY 



IFYMFNDF+KYP++IY++ LRW + 1 SF+ 1 PFAFTAYYPA+ YFL +++ FNIGG++++SL+ 



+SL LW G++ YESAGS 
ALSLKLWKWGLEAYESAGS 2 SI 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1255 

A DNA sequence (GBSxl332) was identified in S.agalactiae <SEQ ID 3875> which encodes the amino 
acid sequence <SEQ ID 3876>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 




121 




181 


Sb j ct : 


181 




241 


Sb j ct : 


241 



Likelihood =-: 
INTEGRAL Likelihood = ■ 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = ■ 



Transmembrane 147 - 163 
Transmembrane 119 - 135 
Transmembrane 238 - 254 



OS Transmembrane SI - 77 
22 Transmembrane 27 - 43 



27 - 



Final Results 

bacterial membrane Certainty=0 . 7241 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB88837 GB:AL353832 putative integral membrane protein. 
[Streptomyces coelicolor A3 (2) ] 
Identities = 60/271 (22%), Positives = 118/271 (43%), Gaps = 13/271 (4%) 

Query: 6 RRYKPFISTGIQGLITYRVDFILYRIGDVIGAFVAFYLWKAVFDSSSQSLIQGFQLSDMI 65 

R Y + G + TYR 4- + + Y + A++D Q + G+ + + 

Sbjct: 7 RLYVAVARGGFRRYATYRAaTAAGVFTNTVFGL I LVYTYLALWDEKPQ - - LGGYDQAQAV 64 

Query: 66 LYIIMS-FVTNLLTRTDSSFM--IGDEVl<D3SIimLLRPVHFAASYLFMEIGSRWLIFL 122 

++ + + L F + + ++ G + + L RP +L ++G L 

Sbjct: 65 TFVWLGQALI^AaLAIGGGGFEDELMERIRTGDVAVDLYRPADLQLWWIAADVGRAVFQLL 124 

Query: 123 SIGV-PFLLVITGVRLFLGTDLIQAIVLWFYIISIILAFLINFFFNICFGFSAFVFKNL 181 

GV PF+ LF L + + + ++++++1A ++ F SAF + 

Sbjct: 125 GRGWPFVFG SLFFPVALPREVSVWAAFLVAVVIAMWGFALRYLVALSAFWLLDG 180 

Query: 182 WGSNLLKNSLVAFMSGSLIPLTFFPKIVADILGFLPFSSLI YTP VMI I IGKYDGSQIVQA 241 

G + F SG L+PL FP ++ D++ LP+SSL+ P +++G+ D 4 
Sbjct: 181 TGVTQMAWIAGLFCSGMLLPI^NVFroVIjGIWvRALPWSSLLQGPADVLLGEADP LGT 237 

Query: 242 LLLQIFWLIVMVALSQLIWKKVQLHITIQGG 272 

L Q W + ++AL +L+ + +QGG 

Sbjct: 238 YDFQASWAVALLALGRLVQSARTRRVVVQGG 268 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3877> which encodes the amino acid 
sequence <SEQ ID 3878>. Analysis of this protein sequence reveals the following: 



INTEGRAL Likelihood = -9.18 Transmembrane 252 - 268 ( 248 - 277) 

INTEGRAL Likelihood = -7.22 Transmembrane 161 - 177 ( 151 - 187) 

INTEGRAL Likelihood = -6.10 Transmembrane 133 - 149 ( 128 - 160) 

INTEGRAL Likelihood = -2.81 Transmembrane 213 - 229 ( 211 - 230) 



Final Results 

bacterial membrane Certainty=0 .4673 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF11144 GB:AE002002 conserved hypothetical protein [Deinococcus radiodurans] 
Identities = 56/268 (20%) , Positives = 113/268 (41%) , Gaps = 21/268 (7%) 



CDGSIIMRLLRPV HFAASYLFMEIG 129 

k G++ LL P+ FAA + 



Query: 


15 


Sbjct: 


1 


Query: 


75 


Sbjct: 


60 




130 


Sbjct: 


118 




187 


Sbjct: 






247 


Sbjct: 


228 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 199/268 (74%) , Positives = 236/268 (87%) 



Query: 


5 


Sbjct: 


19 




65 


Sbjct: 


79 


Query: 


125 


Sbjct: 


139 




185 


Sbj ct : 




Sbjct: 


259 



YIIMSFVT LLT++DSSFMIG+EVKDGSIIMRLLRPVHFAASYLFMEIG RW++ 4S+ 



! PFL+V++G+++ G ++Q + Y++S++LAFLINF+FNICFG SAFVFKNLWGS 



WO 02/34771 



PCT/GB01/04789 



Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1256 

A DNA sequence (GBSxl333) was identified in S.agalactiae <SEQ ID 3879> which encodes the amino 
acid sequence <SEQ ID 3880>. This protein is predicted to be ABC transporter, ATP-binding protein. 
Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2013 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 968 1> which encodes amino acid sequence <SEQ ID 9682> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF09790 GB:AE001882 ABC transporter, ATP-binding protein 
[Deinococcus radiodurans] 
Identities = 141/331 (42%) , Positives = 201/331 (60%) , Gaps = 34/331 (10%) 

Query: 10 MIEVSHLQKNFIKTVKAPGLKGAFQSFLRPEKHTFEAVKDLTFDVPKGQILGFIGANGAG 69 

MIEV HL K+F + AV+D++F +P G+I+G++G NGAG 

Sbjct: 46 MIEVRHLCKSFARK PAVQDISFS I PAGE I VGYLGPNGAG 84 

Query: 70 KSTTIKMLTGILKPTSGFCRIDGKLPQENRQNYWDIGWFGQRTQLWWDLALQETYTVL 129 

KSTTIK+LTG+L P SG .R+ G +P + R+ +V +G VFGQRT LWWDL ++E+ +L 
Sbjct: 85 KSTTIICVLTGLLVPDSGEVRVGGLVPWKQRRQHVARLGAVFGQRTTLWWDLPVRESLELL 144 

Query: 130 KEIYDVPDKEFRKRMAFLNEVLELNDFIKDPWTLSLGQRMRADIAASLLHNPKVLFLDE 189 

+ 4Y VP F + +A E+LEL F+ PR LSLGQRMRAD+AA+LLH+P++LFLDE 
Sbjct: 145 RITTORVPAARFAENIAGFTELLELGPFLNTPARAIjSLGQRMRADLAAALLHDPELLFLDE 204 

Query: 190 PTIGLDVSVKDNIRRAITQINQEEETTILLTTHDLSDIEQLCHRIFMIDRGQEIFDGTVS 249 

PT+GLDV K+ IR + +N E T+LLTTHDL D+E+L R+ MID G+ +FDG ++ 
Sbjct: 205 PWGLDWAKERIREFVKAWAERGVTVLLTTHDLGDVERLARRVmiDTGRLLFDGPLA 264 

Query: 250 QLKETFGKMKTL- -SFDLRPGQEHISS-SLIGKSEINIKRNDLVLDIQYDSSRYQTADII 306 

+L+ +G + L F+ P Q + +L+G+ ++ Y S A I 

Sbjct: 265 ELQARYGGERELWVEFEKAPAQPALPGLTLLGQDGPRVR YGFSGAAAAPIA 315 

Query: 307 QQTLADFSVRDLKMTDADIEDIIRRFYRNEL 337 

Q T A VRDL + + ++E IRR Y L 
Sbjct: 316 QVT-AI1A.PVRDI1AVKEPEVEATIRRIYEGNL 345 

A related DNA sequence was identified in S.pyogenes <SEQ ID 388 1> which encodes the amino acid 
sequence <SEQ ID 3882>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3315 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 272/330 (82%) , Positives = 305/330 (92%) 
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Query: 8 MSMIEVSHLQKNFIKTVKAPGLKGAFQSFLRPEKHTFEAVKDLTFDVPKGQILGFIGANG 67 

M MIEVSHLQKNF KT+K PGLKGA +SF+ P + FEAVKDL+F+VPKGQILGFIGANG 
Sbjct: 1 MVMIEVSHLQKNFSKTIKEPGLKGALKSFVHPPREIFEAVKDLSFEVPKGQILGFIGANG 60 

Query: 68 AGKSTTIKMLTGILKPTSGFCRIDGKLPQENRQNYVKDIGWFGQRTQLWWDLALQETYT 127 

AGKBTTIKMLTGILKPTSG+CRI+GK+PQ+NRQ W+DIG VFGQRTQLWWDLALQETY 
Sbjct: 61 AGKSTTIKMLTGILKPTSGYCRINGKIPQDNRQYYVRDIGAVFGQRTQLWWDLALQETYV 120 

Query: 128 VLKEIYDVPDKEFRIO^FUffiVLErjroFIKDPTOTLSLGQRMRADIAASLLHKPKVLFL 187 

VLKEIYDVP+K FRKRM FLNEVL+LN+FIKDPWTLSLGQRMRADIAASLLHNPKVLFL 
Sbjct: 121 VLKEIYDVPEKAFRKRMDFI^VLDI^FIKDPWTLSLGQRMRADIAASLLHNPKVLFL 180 

Query: 188 DEPTIGLDVSVKBNIRRAITQIKQEEETTILLTTHDLSDIEQLCHRIFMIDRGQEIFDGT 247 

DEPTIGLDVSVKDNIRRAITQINQEEETTILLTTKDLSDIEQLC RI MID+GQEI FDGT 
Sbjct: 181 DEPTIGLDVSVKDNIRRAITQINQEEETTILLTTHDLSDIEQLCDRIIMIDKGQEIFDGT 240 

Query: 248 VSQLKETFGKMKTLSFDLRPGQEHISSSLIGKSEINIKRNDLVLDIQYDSSRYQTADIIQ 307 

V+QLK++FGKMK+LSF+L+PGQE + S +G +1 ++R++L LDIQYDSSRYQTADIIQ 
Sbjct: 241 VTQLKQSFGKMKSLSFELKPGQEQWSQFM3LPDITVERHELSLDIQYDSSRYQTADIIQ 300 

Query: 308 QTLADFSVRDLKMTDADIEDIIRRFYRNEL 337 

+T4-ADF+VRD+KMTD DIEDI+RRFYR EL 
Sbjct: 301 KTMADFAVRDVKMTDVD I EDIVRRFYRKEL 330 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1257 

A DNA sequence (GBSxl334) was identified in S.agalactiae <SEQ ID 3883> which encodes the amino 
acid sequence <SEQ ID 3884>. This protein is predicted to be Fmt. Analysis of this protein sequence 
reveals the following: 

Possible site: 32 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.39 Transmembrane 21 - 37 ( 8 - 39) 
INTEGRAL Likelihood = -7.75 Transmembrane 360 - 376 ( 359 - 381) 

Final Results 

bacterial membrane Certainty=0. 4758 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8775> which encodes amino acid sequence <SEQ ID 8776> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: 8.85 

GvH: Signal Score (-7.5): -3.75 
Possible site: 25 

»> Seems to have an uncleavable N-term signal seq 

ALOM program count: 2 value: -9.39 threshold: 0.0 

INTEGRAL Likelihood = -9.39 Transmembrane 21- 37( 8- 39) 
INTEGRAL Likelihood = -7.75 Transmembrane 353 - 369 (352 - 374) 
PERIPHERAL Likelihood =4.24 92 
modified ALOM score: 2.38 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 4758 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA24012 GB:AB009635 Fmt [Staphylococcus aureus] 
Identities = 72/279 (25%) , Positives = 125/279 (43%) , Gaps = 25/279 (8%) 

LHRFMRKNNVNGMMI VSDNTGKP I TI SHGINRGEVETDIEN - -NKLFPMASLQKLMTGII 106 
+ ++++ + NG + + +N GK + +S G + E I+N N +F + S OK TG++ 
IDKYLQSSLFNGSVAIYEN-GK-LKMSKGYGYQDFEKGIKOTPNTMFLIGSAQKFSTGLL 136 



■ +D +S++ P K S I 4 



K K Y++ N+ L + +VTG++YAE I +PL+L T Y 



■-DEYTKHQNDAISHYYGGLYMHGRIVNSNGTFF 312 

+ TK D Y G Y + NG FF 
jHEFGTKQYPD EYRYGFYAKPTLNRLNGGFF 347 



Query: 




Sb : ct: 
Query: 


107 


Sbjct: 


137 


Query: 


167 


Sbjct: 


195 




226 


Sbjct: 


253 




281 


Sbjct: 


312 



25 There is also homology to SEQ ID 3886. 

A related GBS gene <SEQ ID 8773> and protein <SEQ ID 8774> were also identified. Analysis of tl 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 14.89 
30 GvH: Signal Score (-7.5): -3.75 

Possible site: 25 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -9.39 threshold: 0.0 

INTEGRAL Likelihood = -9.39 Transmembrane 14 - 30 ( 1 - 32) 
35 PERIPHERAL Likelihood =4.24 85 

modified ALOM score: 2.38 



bacterial membrane Certainty=0 .4758 (Affirmative) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the databases: 

29.6/49.6% over 218aa 
Bacillus cereus 

GP | 4127525 | D-stereospecif ic peptide hydrolase Insert characterized 

50 ORF00162(478 - 1033 of 1644) 

GP|4127525|emb|CAA09676.l| |AJ011526(67 - 285 of 389) D-stereospecif ic peptide hydrolase 
{Bacillus cereus} 
%Match =5.8 

%Identity =29.5 %Similarity = 49.5 
55 Matches = 62 Mismatches = 96 Conservative Sub.s = 42 

330 360 390 420 450 480 510 540 

MILRRLFMWKFLKSLLSLFLIAVIATGISVACFFFIPEIIKGNITPILLHRFMRKNNWGMMIVSDNTGKPITI 

: I :|: : : . | |.< = || : || 

60 TCASIlALLIAGSSLLYTTPTSIVKAEPTQNVSSSLQTl'^^QRDRTSv^<QA^IRDTLQLGYPGIIlAKTSEGGKTWGYAAGIAD 
20 30 40 50 60 70 80 

570 600 630 660 705 735 753 
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GEVETDIENNKLFPMASLQKLMTGI IIQRLIDQDVLSEDDRLSQFFPQV KG- -SNSITIHQLLTHTSGL REKG 

: := = | : |: | | == :|: == | || = : = | | | I ||| = = :| lllh I I 
LRTKKPMKTDFRFRIGSVTKTFTATWLQLVGENRLKLDDH-EDWLPGVIQGNGYDGNKITIQEILNHTSGIAEYSRSKD 
100 110 120 130 140 150 160 

5 

807 834 854 894 924 954 978 

VKVS PYLKN- - EREQLQFCIiKHY - NFVNKKSWYYSNINFS FLTG I ATQVTGRTYAELVDDVIKNPLRLDDT - - QS YQSW 
I = 1= I = = . =1 I Mil = =1 = =111 =111 I = = I II I =1 11 = 

VDFTDTKKSYTAEELVKMGISFPPDFAPGKGWSYSNTGYVLLGILIEKVTGNSYAEEVENRIIEPLELSNTFLPGNSSVI 
10 180 190 200 210 220 230 240 

993 1023 1053 1083 1113 1143 1173 1203 

- - -NH - -DLVSPMRKNGKLNKINI FNQVSTAYGAGDFFTTPLNFW\7LMRSFSKGYFFPTDEYTKHQNDAI SHYYGGLYMH 
II II =1 = =1 I ID :| :: : : | :: = : | 

15 PGTNHARGWQP-DGASELKDVTYYN-PSAGSSAGDM1STADDLNKFFSYLLGGKLLKEQQLKQMLTTVPTGKEGIDGYG 
260 270 280 290 300 310 320 

SEQ ID 8776 (GBS61) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 33 (lane 3; MW 68kDa). 

20 GBS61-GST was purified as shown in Figure 195, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1258 

A DNA sequence (GBSxl335) was identified in S.agalactiae <SEQ ID 3887> which encodes the amino 
25 acid sequence <SEQ ID 3888>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 .2398 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1259 

A DNA sequence (GBSxl336) was identified in S.agalactiae <SEQ ID 3889> which encodes the amino 
40 acid sequence <SEQ ID 3890>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.57 Transmembrane 16 - 32 ( 13 - 33) 

45 Final Results 

bacterial membrane Certainty=0 .3230 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



50 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1260 

A DNA sequence (GBSxl337) was identified in S.agalactiae <SEQ ID 3891> which encodes the amino 
acid sequence <SEQ ID 3892>. Analysis of this protein sequence reveals the following: 

3 N-tenrdnal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3910 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1261 

A DNA sequence (GBSxl338) was identified in S.agalactiae <SEQ ID 3893> which encodes the amino 
acid sequence <SEQ ID 3894>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4239 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside --- Certainty=0.0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1262 

A DNA sequence (GBSxl339) was identified in S.agalactiae <SEQ ID 3895> which encodes the amino 
acid sequence <SEQ ID 3896>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4349 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1263 

A DNA sequence (GBSxl340) was identified in S.agalactiae <SEQ ID 3897> which encodes the amino 
acid sequence <SEQ ID 3898>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4952 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1264 

A DNA sequence (GBSxl341) was identified in S.agalactiae <SEQ ID 3899> which encodes the amino 
acid sequence <SEQ ID 390O. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4014 (Affirmative) < succ 

bacterial membrane , Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 9 LIHWEGNSGDKLIEHQTSATGWYYQVDRSFSQPKG 43 

L +WEGNSGDKL+E QT AT WYYQ+++ FSQ G 
Sbjct: 180 LTYWEGNSGDKLIjERQTRATEWYYQIEKGFSQTNG 214 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1265 

A DNA sequence (GBSxl342) was identified in S.agalactiae <SEQ ID 3901> which encodes the amino 
acid sequence <SEQ ID 3902>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



• Final Results 

bacterial cytoplasm Certainty=0. 203 6 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 1266 

A DNA sequence (GBSxl343) was identified in S.agalactiae <SEQ ID 3903> which encodes the amino 
acid sequence <SEQ ID 3904>. Analysis of this protein sequence reveals the following: 

Possible site: 47 
10 >» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10933> which encodes amino acid sequence <SEQ ID 
10934> was also identified. 

SEQ ID 3904 (GBS153) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
20 extract is shown in Figure 25 (lane 3; MW 22kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 31 (lane 4; MW 47kDa). 

GBS153-GST was purified as shown in Figure 198, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 1267 

A DNA sequence (GBSxl344) was identified in S.agalactiae <SEQ ID 3905> which encodes the amino 
acid sequence <SEQ ID 3906>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0. 2036 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 1268 

A DNA sequence (GBSxl345) was identified in S.agalactiae <SEQ ID 3907> which encodes the amino 
acid sequence <SEQ ID 3908>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
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»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=Q .2570 (Affirmative) < suco 

5 bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA59773 GB:X85787 tasA [Streptococcus pneumoniae] 
10 Identities = 18/33 (54%) , Positives = 28/33 (84%) 

Query: 2 DVQSDENFAFKIFKVAKAKGLSLDVFDKLVGRF 34 

+ QSD+N F++FKV+K KG++LD FD+++GRF 
Sbjct: 320 KYQSDKNPFFEVFKVSKTKGIALDPFDEIIGRF 352 

15 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3909> which encodes the amino acid 
sequence <SEQ ID 3910>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

20 

Final Results 

bacterial cytoplasm Certainty=0 .2405 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 00 0 0 (Not Clear) < suco 

25 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 18/34 (52%) , Positives = 25/34 (72%) 

Query: 1 MDVQSDENFAFKIFKVAKAKGLSLDVFDKLVGRF 34 
30 +DVQSDE+F FK+ KV K+KG+ L+ D+ V F 

Sbjct: 31 LDVQSDEDFGFKWKVLKSKGIVLNALDESVCGF 64 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 1269 

A DNA sequence (GBSxl346) was identified in S.agalactiae <SEQ ID 391 1> which encodes the amino 
acid sequence <SEQ ID 3912>. This protein is predicted to be a fimbria-associated protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 52 
40 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.17 Transmembrane 169 - 165 ( 168 - 185) 

Final Results 

bacterial membrane Certainty=0 . 1468 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC13546 GB:AF019629 putative fimbria-associated protein 
50 [Actinomyces naeslundii] 

Identities = 53/109 (48%), Positives - 75/109 (68%) 

Query: 13 IPKINQDLPIYAGSEEDNLQRGVGHLEGISLEIGGASTHAVLSGQRGMPAARLFADLDKM 72 
IP 1+ DLP+Y G+ +D L +G+GHLEG SLP+GG T +V++G RG+ A +F +LDK+ 
55 Sbjct: 93 IPSISLDLPVYHGTADDTLLKGLGHLEGTSLPVGGEGTRSVITGHRGLAEATMFTNLDKV 152 



Query: 73 KKGDYFYVTNLKETLAyQVDRIMVIEPSQLDAVSIEEDKDYVTLLTCTP 121 
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K GD V EL Y+V V+EP + +A+ +EE KD +TL+TCTP 
Sbjct: 153 KTGDSLIVEVFGEVLTYRVTSTKWEPEETEALRVEEGKDLLTLVTCTP 201 

There is also homology to SEQ ID 3740 and to SEQ ID 3910. 

5 SEQ ID 3912 (GBS194) was expressed in E.coli as a His-fasion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 177 (lane 2; MW 24kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1270 

10 A DNA sequence (GBSxl347) was identified in S.agalactiae <SEQ ID 3913> which encodes the amino 
acid sequence <SEQ ID 3914>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.15 Transmembrane 8S0 - 896 ( 876 - 898) 
15 INTEGRAL Likelihood = -4.78 Transmembrane 24 - 40 ( 23 - 42) 

Final Results 

bacterial membrane Certainty=0. 3060 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm --- Certainty=0 . 0000 (Hot Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8777> which encodes amino acid sequence <SEQ ID 8778> 

was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
25 SRCFLG: 0 

McG: Length of UR: 20 

Peak value of UR: 2.80 
Net Charge of CR: 5 
McG: Discrim Score: 10.81 
30 GvH: Signal Score (-7.5): -3.76 

Possible site: 29 
»> Seems to have an uncleavable N-term signal seq 
Amino Acid Composition: calculated from 1 
ALOM program count: 2 value: -5.15 threshold: 0.0 
35 INTEGRAL Likelihood = -5.15 Transmembrane 867 

INTEGRAL Likelihood = -4.78 Transmembrane 11 
PERIPHERAL Likelihood = 7.58 531 
modified ALOM score: 1.53 
icml HYP ID: 7 CFP: 0.306 

40 

*** Reasoning Step: 3 

Final Results 

bacterial membrane 
45 bacterial outside 

bacterial cytoplasm 

LPXTG motif: 859-B63 

50 No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8778 (GBS 104) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 27 (lane 5; MW 95kDa). 

GBS 1 04-His was purified as shown in Figure 22 1 , lane 9-10. 



- 883 ( 863 - 885) 

- 27 ( 10 - 29) 



Certainty=0. 3060 (Affirmative) < suco 

--- Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1271 

A DNA sequence (GBSxl348) was identified in S.agalactiae <SEQ ID 3915> which encodes the amino 
acid sequence <SEQ ID 3916>. This protein is predicted to be a fimbria-associated protein. Analysis of this 
protein sequence reveals the following: 
Possible site: 40 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-15.28 Transmembrane 257 - 
INTEGRAL Likelihood = -7.11 Transmembrane 19 - 

Final Results 

bacterial membrane Certainty=0. 7114 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) <; suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC13546 GB:AF019629 putative f imbria-associated protein 
[Actinomyces naeslundii] 
Identities = 79/178 (44%) , Positives = 112/178 (62%) , Gaps = 7/178 (3%) 

Query: 65 RIALANAYNETLSRNPLL IDPFTSKQKEGLREYARMLEVHEQ- - IGHVAIPSIGV 117 

++ A+AYN+ LS +L + K+ +YA +L+ + + + + IPSI + 

Sbjct: 3 9 QVEQ7AHAYNDALSAGAVLEANNHVPTGAGS SKDSSLQYANILKANNEGLMARLKI PS I SL 98 

Query: 118 DIPiyAGTSETVLQKGSGHLEGTSLPVGGLSTHSVLTAHRGLPTARLFTDLNKVKKGQIF 177 

D+P+Y GT++ L KG GHLEGTSLPVGG T SV+T HRGL A +FT+L+KVK G 
Sbjct: 99 DLPVYHGTADDTLLKGLGHLEGTSLPVGGEGTRSVITGHRGLAEATMFTNLDKVKTGDSL 158 

Query: 178 YVTNIKETIAYKWSIKVVDPTALSEVKIVNGKDYITLLTCTPYMINSHRLLVKGERI 235 

V EL Y+V S KOT+P +++ GKD +TL+TCTP IN+HR+L+ GERI 

Sbjct: 159 IVEVFGEVLTYRVTSTKVVEPEETEALRVEEGKDLLTLVTCTPLGINTHRILLTGERI 216 

There is also homology to SEQ ID 3740. 

SEQ ID 3916 (GBS208) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 5; MW 35kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 8; MW 59.7kDa) and in Figure 
160 Cane 5; MW 60kDa). 

GBS208-GST was purified as shown in Figure 224, lane 7-8. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1272 

A DNA sequence (GBSxl349) was identified in S.agalactiae <SEQ ID 3917> which encodes the amino 
acid sequence <SEQ ID 3918>. This protein is predicted to be a fimbria-associated protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 30 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -9.13 Transmembrane 265 - 281 ( 260 - 284) 

Final Results 
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bacterial membrane Certainty=0. 4652 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC13546 GB:AF019629 putative f imbria-associated protein 
[Actinomyces naeslundii] 
Identities = 96/265 (36%) , Positives = 150/265 (56%) , Gaps = 10/265 (3%) 





41 


Sbjct: 


15 


Query: 


100 


Sbjct: 


72 




158 


Sbjct: 


132 




218 


Sbjct: 


192 


Query: 


277 


Sbjct: 


252 



++E A+AYN ++ AGA 



+ IP 1+ D4-P+Y 3+A++ L +G+GHLEGTSLPVGGE T 



+V+T HRGL A +FTNLDKV GD +E G 4 Y+V KV+ P++ E L V +G+ 



D +TL+TCTP IN+HR+L+ G+RI P K + K + +A+ + GLI++ 



There is also homology to SEQ ID 3740. 

SEQ ID 3918 (GBS209) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 50 (lane 4; MW 62kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 3; MW 37.2kDa). 

GBS209-His was purified as shown in Figure 221, lane 8. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1273 

A DNA sequence (GBSxl350) was identified in S.agalactiae <SEQ ID 3919> which encodes the amino 
acid sequence <SEQ ID 3920>. Analysis of this protein sequence reveals the following: 

40 Possible site: 27 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -9.66 Transmembrane 281 - 297 ( 276 - 300) 



Final Results 

45 bacterial membrane Certainty=0. 4864 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04080 GB:AP001508 unknown [Bacillus halodurans] 
Identities = 45/141 (31%) , Positives = 63/141 (43%) , Gaps = 20/141 (14%) 

Query: 153 TGELDLLWGVDGDTKKPIAGWFELYEKKGRTPIRVKNGVHSQDIDAAKHLETDSSGHI 212 

TG L++ KV D DT + L G F LY+ G IR LET G 

Sbjct: 1084 TGSLEVTKV- -DADTGEVLQGATFTLYDSEGEFAIRT LETGEDGKA 1127 
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Query: 213 RISGLIHGDyVLKEIETQSGYQIGQRETAVTIEKSKTVTVTIEmCKVPTPKVPSRGGL-I 271 

L++GDY+LKE GY +G +T + VT+EN+K +V + G + + 

Sbjct: 1128 TPVNLLYGDYLLKEDSAPEGYLVGINDTQRWIDTVLHEVTVENEKSDINRVSAVGAVQL 1187 

Query: 272 PKTGEQQAMALVIIGGILIAL 292 
K E+ +L G L AL 

Sbjct: 1188 QKVDEETGESL QGALFAL 1205 

Identities = 64/259 (24%) , Positives = 113/259 (42%) , Gaps = 48/259 (18%) 



Query: 16 GTMFGI SQT VLAQETHQLT I VHLEARD I DRPNP QLE IAPKE - GTPIEGVLYQL 67 

G + GI+ T + H++T+ + E DI+R + QL+ +E G ++G L+ L 

Sbjct: 1147 GYLVGIlTOTQRVTIDTVLHEVTVEN-EKSDINRVSAVaAVQLQKVDEETGESLQGALFAL 1205 

Query: 68 YQLKSTEDGDLIAHWNSLTITELKKQAQQVFEATTNQQGKATFNQLPDGIYYGL AV 123 

Q E +TI E++ ++A + + G F+L +YL V 

Sbjct: 1206 QQKVDDE FVT1AEMETDEEGIVFAGSLEPGDYQFVELNAPVGYKLDETPW 1256 

Query: 124 KAGEKNRNVSAFLVDriSEDKVIYPKIIWSTGELDLLKVGVDGDTKKPIiAGWFELYEKNG 183 

E++R + ++L ++ + P G + L+KV DD LGFL+G 

Sbjct: 1257 FTVEEDRTET IELQKENHLIP GSVQLVKVDAD - DAANTLEGAEFTLLDGEG 1306 

Query: 184 RTPIRVKNGVHSQDIDAAKHLETDSSGHIRISGLIHGDYVLKEIETQSGYQIGQAETAVT 243 

V+ G L TD +G + ++ L G+Y E + +GY++ T 

Sbjct: 1307 NV---VQEG- LTTDENGQVWTDLKPGEYQFVETKAPAGYELEATPIGFT 1352 

Query: 244 IEKS- - KTVTVTIENKKVP 260 

IE++ + TV +EN +P 
Sbjct: 1353 IERNQQEVATVAVENHLIP 1371 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3920 (GBS52) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 4; MW 30.5kDa). 

GBS52-His was purified as shown in Figure 192, lane 8. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1274 

A DNA sequence (GBSxl351) was identified in S.agalactiae <SEQ ID 3921> which encodes the amino 
acid sequence <SEQ ID 3922>. Analysis of this protein sequence reveals the following: 



Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.26 Transmembrane 554 - 570 ( 551 - 

INTEGRAL Likelihood = -0.16 Transmembrane 34 - 50 ( 34 - 

Final Results 

bacterial membrane Certainty=0 .3506 (Affirmative) • 

bacterial outside Certainty=0 . 0000 (Not Clear) < : 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 



50 A related GBS nucleic acid sequence <SEQ ID 8779> which encodes amino acid sequence <SEQ ID 8780> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
McG: Discrim Score: -5.81 
GvH: Signal Score (-7.5): -1.92 
55 Possible site: 37 

>» Seems to have a cleavable N-terminal signal sequence 
ALOM program count: 2 value: -6.26 threshold: 0.0 
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INTEGRAL Likelihood = -6.26 Transmembrane 527 - 543 (524 - 548) 
PERIPHERAL Likelihood = 5.36 194 
modified ALOM score: 1.75 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 3506 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

LPXTG motif: 521-525 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA57459 GB:X81869 orf2 [Lactobacillus leichmannii] 
Identities = 140/505 (27%) , Positives = 220/505 (42%) , Gaps = 94/505 (18%) 

Query: 102 GEVISNYAKLGDNVKGLQGVQFKRYKVKTDI SVDELKKLTTVEAADAKVGTILEE 156 

GE+4+++ G LGVFKYV SD+T +DAK L 

Sbjct: 58 GEIMNDFGGTG LNGVTFKAYNVTDHYLSLRKSGDSAQDAVTAIQSDAKDSDNLPS 112 

Query: 157 - -GVSLPQKTNAQGLWDAL DSKSNVR-YLYVEDLKNSPSNITKAYAVPFV 204 

G ++ +T A D + DS N + YL+VE +SP+++T+ A P V 

Sbjct: 113 YAGSAIATETTATSKGEDGIAAFDNLNLKDSDGNYQTYLFVET- -DSPTDVTQQ-AAPIV 169 

Query: 205 LELPVANSTGTGFLS-EINIYPKNWTDEPKTDKDVKKLGQDDAGYTI G 252 

L +P+ ++ T ++ +1 IYPKNV +PTKD+++DT+ G 
Sbjct: 170 LTMPIYKTSDTSAINHDIQIYPKNVKST-PIT-KDLDEASKKDLAVTLPDGSTIYNAQYG 227 

Query: 253 EEFKWFLKSTIPANLGDYEKFEITDKFADGLTYICSVGKIKIGSKTLNRDEHYTIDEPTVD 312 

+ F + + +PN+D + F + DK G+ + + L+ YT+++ 
Sbjct: 228 KS FGYNITVNVPWNI KDKDTFNWDKPDTGI DIDASTVSIDGLTKSTDYTVNK 280 

Query: 313 NQNTLKITFKPEKFKEIAELLKGMTLVKNQDALDKATANTDDAAFLEIPVASTINEKAVL 372 

N ++ FK + h G +L I +T+ A 

Sbjct: 281 KDNGYQWFKTTS- -AAVQALAGKSLT ITYKATLTNNATP 318 

Query: 373 GKAIENTFELQYDHTPDKADNPKPSNPPRKPEVHTGGKRFVKKDSTETQTLGGAEFDLLA 432 

KAINTL++ SPP ++TGG +FVKKDS +TL GAEF L+ 

Sbjct: 319 DKAIGNTATLSIGNGTNIT STPANGPRIYTGGAQFVKKDSQSNKTLAGAEFQLVK 373 

Query: 433 - - SDGTAVKWTDALI KANTNKNYI AGEAVTGQPI KLKSHTDGTFE I KGLAYA VDANAEGT 490 

S+G V + + N A EA T S +G +KGL+Y ++ + 
Sbjct: 374 VDSNGNIVSYATQASDGSYTWNDSATEATT YTSDANGLVALKGLSY SDKLDS 425 

Query: 491 AVTYKLKETKAPEGYVI PDKEIEFTVSQTSYNTKPTDITVDSADATPDTI KNNKRPS IPN 550 

+Y L E +AP+GY D ++F+++Q S+ D+ TI N K +P+ . 

Sbjct: 426 GES YALLE I QAPDGYAKLDSP VKFS ITQGSF GDSNKITIDNTKEGLLPS 474 

Query: 551 TGGIGTAIFVAIGAAVMAFAVKGMK 575 

TGG G IF+AIG +M A G K 
Sbjct: 475 TGGKGIYIFLAIGIVIMIVAFGGYK 499 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8780 (GBS80) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 16 (lane 6; MW 56.8kDa). 

The GBS80-His fusion product was purified (Figure 104A; see also Figure 194, lane 5) and used to 
immunise mice (lane 1+2 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 
104B), FACS (Figure 104C ), and in the in vivo passive protection assay (Table III). These tests confirm 
that the protein is immunoaccessible on GBS and that it is an effective protective immunogen. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1275 

A DNA sequence (GBSxl352) was identified in S.agalactiae <SEQ ID 3923> which encodes the amino 
5 acid sequence <SEQ ID 3924>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>» Seems to have no N-terminal signal sequence 

_. Final Results 

10 bacterial cytoplasm — Certainty=0. 4043 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

1 5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1276 

A DNA sequence (GBSxl353) was identified in S.agalactiae <SEQ ID 3925> which encodes the amino 
20 acid sequence <SEQ ID 3926>. This protein is predicted to be MsmR. Analysis of this protein sequence 
reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.01 Transmembrane 75 - 91 ( 75 - 92) 

25 

Pinal Results 

bacterial membrane Certainty=0 . 1404 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

30 

A related GBS nucleic acid sequence <SEQ ID 9679> which encodes ammo acid sequence <SEQ ID 9680> 
was also identified. 

SEQ ID 3926 (GBS360) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 81 (lane 9; MW 74kDa). 

35 GBS360-GST was purified as shown in Figure 216, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1277 

A DNA sequence (GBSxl354) was identified in S.agalactiae <SEQ ID 3927> which encodes the amino 
40 acid sequence <SEQ ID 3928>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm — Certainty=0 . 1762 (Affirmative) < suco 
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- Certainty=0.0000 (Not Clear) < sue© 

- Certaiuty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3929> which encodes the amino acid 
sequence <SEQ ID 3930>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1640 (Affirmative) < suco 

bacterial membrane --- Certainty=0.0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 93/98 (94%) , Positives = 96/98 (97%) 

Query: 1 MDKIIKSISASGAFRSYVtiDSTETVKLAQEiCHHTLSSSTVALGRTLIANQILAANQKGDS 60 

MDKIIKSI+ SGAFR+YVLDSTETV LAQEKH+TLSSSTVALGRTLIANQILAANQKGDS 
Sbjct: 1 MDKIIKSIAQSGAFRAyVLDSTETVALAQEKHNTLSSSTVALGRTLIANQILAANQICGDS 60 

Query: 61 KI TVKVI GDS S FGH 1 1 S VADTKGHVKGY I QNTGVDI KK 98 

KITVKVIGDSSFGHIISVADTKGHVKGYIQNTGVDIKK 
Sbjct: 61 KITVKVIGDSSFGHIISVADTKGHVKGYIQNTGVDIKK 98 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1278 

A DNA sequence (GBSxl355) was identified in S.agalactiae <SEQ ID 393 1> which encodes the amino 
acid sequence <SEQ ID 3932>. Analysis of this protein sequence reveals the following: 

i uncleavable N-term signal seq 

Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MQEVLIIARENHQVTHEHVSILLTCVQELIVEVNQTQPLSREFREKYM 48 

+ EV IIA+ NHQVTHEHVS ILLTC+QELI EV +T PLS +F KYM 
Sbjct: 70 VHEVFI IAKTNHQVTHEHVS ILLTCIQELI KEVEKTGPLSEDFCNKYM 117 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1279 

A DNA sequence (GBSxl356) was identified in S.agalactiae <SEQ ID 3933> which encodes the amino 
acid sequence <SEQ ID 3934>. This protein is predicted to be TnpA (orfB). Analysis of this protein 
sequence reveals the following: 
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N-terrainal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 524 8 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9907> which encodes amino acid sequence <SEQ ID 9908> 
was also identified. A further related GBS nucleic acid sequence <SEQ ID 9677> which encodes amino acid 
sequence <SEQ ID 9678> was also identified. A further related GBS nucleic acid sequence <SEQ ID 
1091 1> which encodes amino acid sequence <SEQ ID 1 0912> was also identified. 

There is homology to SEQ ID 1336. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1280 

A DNA sequence (GBSxl357) was identified in S.agalactiae <SEQ ID 3935> which encodes the amino 
acid sequence <SEQ ID 3936>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .4489 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB64982 GB:U43834 Ydr540cp [Saccharomyces cerevisiae] 
Identities = 93/171 (54%) , Positives = 121/171 (70%) , Gaps = 3/171 (1%) 

Query: 1 MRVYENICEELKKEISKTFEKYI^FNNIPENLKDKRIDEVDRTPAANLSYQVGWTNLVLK 60 

MR Y +K+ELK+EI K +EKY EF I E+ KD++++ VDRTP+ NLSYQ+GW NL+L+ 
Sbjct: 1 MREYTSKKELKEEIEKKYEKYDAEFETISESQKDEKVETVDRTPSENLSYQLGWVNLLLE 60 

Query: 61 WEEDERKGLQVKTPSDKFKWNQLGELYQWFTDTYAHLSLQELKAKLNENINSIYAMIDLL 120 

WE E G V+TP+ +KWN LG LYQ F Y S++E +AKL E +N +Y I L 
Sbjct: 61 WEAKEIAGYNVETPAPGYKWNNLGGLYQSFY:<KYGIYSIKEQRAKI;REAVNEVYKWISTL 120 

Query: 121 SEEELFFAHMRCTIADEATKTAlWWKFIHvNTVAPFGTFRTKIRKWKKIV 171 

S++ELF+A RKW AT A W VYK+IH+NTVAPF FR KIRKWK++V 
Sbjct: 121 SDDELFQAGNRKW ATTKAMWPVYKWIHINTVAPFTNFRGKIRKWKRLV 168 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1281 

A DNA sequence (GBSxl358) was identified in S.agalactiae <SEQ ID 3937> which encodes the amino 
acid sequence <SEQ ID 3938>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have an uncleavable N- 
INTEGRAL Likelihood = -3.45 
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Final Results 

bacterial membrane --- Certainty=0. 2381 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 8781> which encodes amino acid sequence <SEQ ID 8782> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
10 McG: Discrim Score: 8.80 

GvH: Signal Score (-7.5): -3.94 

Possible site: 28 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -3.45 threshold: 0.0 
15 INTEGRAL Likelihood = -3.45 Transmembrane 7 - 23 ( 2 - 26) 

PERIPHERAL Likelihood = 10.40 69 
modified ALOM score: 1.19 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 2381 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA68889 GB:Y07615 acid phosphatase [Haemophilus influenzae] 

' Identities = 112/245 (45%) , Positives = 148/245 (59%) , Gaps = 10/245 (4%) 

Query: 5 MKKVLVSSLLVLGITITLQTWEAKSPKVAYTQEGMTALSDTNKDKVTTISIDEIQKSLE 64 

MK V+ S++ L +T V G YTQ G A + + IS+D+I++SLE 

Sbjct: 1 MKNVMKLSVIAL LTAAAVPAMAGKTEPYTQSGTNAREMLQEQAIHWISVDQIKQSLE 57 

Query: 65 GKKPITVSFDIDDTLLFSSQYFQYGKEYVTPGSFDFLHKQKFWDLVAKRGDQDSIPKEYA 124 

GK PI VSFDIDDT+LFSS F +G++ +PG D+L Q FW+ V D+ SIPK+ A 
Sbjct: 58 GKAPINVSFDIDDTVLFSSPCFYHGQQKFSPGKHDYLKNQDFWNEVNAGCDKYSIPKQIA 117 

Query: 125 KKLIAMHQKRGDKIVFITGRTRGSMYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKP 184 

LI MHQ RGD++ F TGRT G+VD LKF+ V + G + ++ 

Sbjct: 118 IDLINMHQARGDQVYFFTGRT AGKVDGVTPILEKTFNIKNMHPVEFMGSR-ERT 170 

Query: 185 YKYDKSYYIKKYGSDIHYGDSDDDIHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVL 244 

KY+K+ I + IHYGDSDDD+ AA+EAG R IR++R& NST P+P GGYGEEVL 
Sbjct: 171 TKYNKTPAIISHKVSIHYGDSDDDVLAAKEAGVRGIRLMRAANSTYQPMPTLGGYGEEVL 230 

Query: 245 ENSAY 249 
NS+Y 

Sbjct: 231 INSSY 235 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3939> which encodes the amino acid 
sequence <SEQ ID 3940>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.98 Transmembrane 5 - 22 ( 4 - 25) 

Final Results 

bacterial membrane --- Certainty=0 .2593 (Affirmative) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA68889 GB:Y07615 acid phosphatase [Haemophilus influenzae] 
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Identities = 105/237 (44%), Positives = 141/237 (59%), Gaps = 10/237 (4%) 

LFTVSFCGIIALPVEASGPKVPYTQEGITA--ISNQATVKLISIADIASSLEGQKPITVS 66 
L ++ A+P A G PYTQ G A + + + IS+ I SIiEG+ PI VS 

LSVIALLTAAAVPAMA-GKTEPYTQSCSTNAREMLQEQAIHWISvDQIKQSLEGKAPlIIVS 65 



RGD4+ F TGRT 



Query: 
Sbjct: 


9 
7 


Query: 


67 


Sbjct: 


66 


Query: 


127 


Sbjct: 


126 


Query: 


187 


Sbjct: 


179 



IHYGDSD+D+ AAKEAG R IR++RA NST P+P GGYGEEVL NS+Y 
raiHYGDSDDDVLAAKEAGVRGIRLMRAANSTYQPMPTLGGYGEEVLINSSY 235 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 196/245 (80%), Positives = 216/245 (88%), Gaps = 2/245 (0%) 

Query: 5 ^KVLVSSLLVLGITITLQTVVEAKGPKVAYTQEGMTALSDTNKDKVTTISIDEIQKSIiE 64 

MKK S L + + VEA GPKV YTQEG+TA+S N+ V ISI +1 SLE 

Sbjct: 1 MKKEFTSILFTVSFCGIIALPVEASGPKVPYTQEGITAIS--NQATVKLISIADIASSLE 58 

Query: 65 GKKPITVSFDIDDTLLFSSQYFQYGKEYVTPGSFDFLHKQKFWDLVAKRGDQDSIPKEYA 124 

G+KPITVSFDIDDTLLF+SQYFQYGKEY+TPGSFDFLHKQKFWDLVAKRGDQDSIPKEYA 
Sbjct: 59 GQKPITVSFDIDDTLLFTSQYFQYGKEYITPGSFDFLHKQKFWDLVAKRGDQDSIPKEYA 118 

Query: 125 KKlIAMHQKRGDKIVFITGRTRGSMYKEGEVDKrAKALAKDFKLDKPIAVNYTGDKPKKP 184 

K+LIAMHQKRGDKIVFITGRTRGSMYK+GE+DKTAK+LAKDFKLDKPIA+NYTGDK KP 
Sbjct: 119 KQLIAMHQKRGDKIVFITGRTRGSMYKKGEIDKTAKSLAKDFKLDKPIAINYTGDKAVKP 178 

Query: 185 YKYDKSYYIKKYGSDIHYGDSDDDIHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEvL 244 

Y+YDK+YYIKK GS IHYGDSD+DI+AA+EAGARPIRILRAPNSTNLPLP+AGGYGEEVL 
Sbjct: 179 YQYDKTYYIKKNGSQIHYGDSDEDINAAKEAGARPIRILRAPNSTNLPLPKAGGYGEEVL 238 

Query: 245 ENSAY 249 

ENSAY 
Sbjct: 239 ENSAY 243 

SEQ ID 8782 (GBS 100) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 16 (lane 5; MW 28kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 33 (lane 2; MW 53kDa). 

The GBS100-GST fusion product was purified (Figure 106A; see also Figure 197, lane 4) and used to 
immunise mice (lane 1 product; 9.9jj,g/mouse). The resulting antiserum was used for Western blot (Figure 
106B), FACS (Figure 106C ), and in the in vivo passive protection assay (Table III). These tests confirm 
that the protein is immunoaccessible on GBS and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1282 

A DNA sequence (GBSxl359) was identified in S.agalactiae <SEQ ID 3941> which encodes the amino 
acid sequence <SEQ ID 3942>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .3288 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1283 

A DNA sequence (GBSxl360) was identified in S.agalactiae <SEQ ID 3943> which encodes the amino 
acid sequence <SEQ ID 3944>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4004 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9675> which encodes amino acid sequence <SEQ ID 9676> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



K++ ++KND++E I D++H+G G+AKVDG+ F+ ALPGE +K +V+K++K G+G+ 





12 


Sbjct: 


3 




72 


Sbjct: 


63 


Query: 


132 


Sbjct: 


123 




192 


Sbjct: 


183 




252 


Sbjct: 


243 




312 


Sbjct: 


303 




372 


Sbjct: 


363 




432 


Sbjct: 


423 



L H++Y+ QL +KQKQV D L +1 



T+GM P YRNKAQVPV +G L GF+++ SH ++ +++ +IQ +E D +1 ++L 



G +R++V R G T0++M+VL4T ++ +IE++ A P V 



SI+QN+N + +NVIFG + + L+G + I D + +AISA+SFYQVN E 



f VIDAY GIGTI L +A+Q KHVYGVE+V +A+SDAK NA NG N 



AE M W +G++ VI+VDPPRKG E+ + + K D++ Y+SCN AT+ARD 



+QPVD+FP T H+E VA+L 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3945> which encodes the amino acid 
e <SEQ ID 3946>. Analysis of this protein sequence reveals the following: 

3 W- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1262 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 332/454 (73%) , Positives = 387/454 (85%) 



Query: 72 VEEYLTTSPHRNEGLDYTYLRTGIADLGHLTYEQQLLFKQKQVADNLYKIAHISDVLVEP 131 

VE Y £ EN ++ TYLRTGIADLGHLTYE QL FK+KQV D+LYKIA ISDV VE 
Sbjct: 68 VEAYHYLSSARNADvNLTYLRTGIADLGHLTYEDQLTFKKKQVQDSLYKIAGISDVTVES 127 

Query: 132 TLGMTIPLAYRNKaQVPVRRVDGQLETGFFRKNSHTLVSIEDYLIQEKEIDALINFTRDL 191 

T+GMT PLAYRNKAQVPVRRV+GQLETGFFRK+SH L+ I DY IQ+KEID LINFTRDL 
Sbjct: 128 TIGMTEPLAYRNKAQVPVRRVNGQLETGFFRKHSHDL1PISDYYIQDKEIDRLINFTRDL 187 

Query: 192 LRKFDVKPYDEEQQSGLIRNLWRRGHYTGQLMLVLVTTRPKIFRIDQMIEKLVSAFPSV 251 

LR+FD+KPYDE +Q+GL+RN+WRRGHY+G++MLVLVTTRPK+FR+DQ+IEK+V APP+V 
Sbjct: 188 LRRFDIKFYDETEQTGLLRNIVVRRGHYSGEMm^ 247 

Query: 252 VSIMQNINDRNSNVIFGKEFRTLYGSDTIEDCtlLGNTYAISAQSFYQvOTEMAEKLYQKA 311 

VSI+QNIND+N+N IFGK+F+TLYG DTI D MLGN YAISAQSFYQVNT MAEKLYQ A 
Sbjct: 248 VSIIQNIMDKNTNAIFGKI^FKTLYGKDTITDSMI^NNYAISAQSFYQVNTVMAEKLYQTA 307 

Query: 312 IDFSDLNSEDIVIDAYSGIGTIGLSVAKQVKHVYGVEWEKAVSDAKENATRNGITNSTY 371 

I FSDL+ +DIVIDAYSGIGTIGLS AK VK VYGVEV+E AV DA++NA NGITN+ + 
Sbjct: 308 IAFSDLSKDDIVIDAYSGIGTIGLSFAKTVKAVYGVEVIEAAVRDAQQNAALNGITNAYF 3 67 

Query: 372 VADSAENAMAKWLKEGIKPWIIWDPPRKGLTESFVYSAAQTKADKITYISCINSATMARD 431 

VAD+AE+AMA W K+GIKP+VI+VDPPRKGLTESF+ ++ KITY+SCN ATMARD 

Sbjct: 368 VADTAEHAMATWAKDGIKPSVILVDPPRKGLTESFIQASVAMGPQKITYVSCNPATMARD 427 

Query: 432 IKLFEELGYHLVKIQPVDLFPMTHHVECVALLVK 465 

IK ++ELGY L K4QPVDLFP THHVECV LL+K 
Sbjct: 428 IKRYQELGYKIAKVQPVDLFPQTHHVECVVLI.IK 461 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1284 

A DNA sequence (GBSxl361) was identified in S.agalactiae <SEQ ID 3947> which encodes the amino 
acid sequence <SEQ ID 3948>. This protein is predicted to be PSR protein. Analysis of this protein 
sequence reveals the following: 



3 N- terminal signal t 



Likelihood =-12.15 Transmembrane 135 - 151 ( 127 - 

■ Final Results 

bacterial membrane Certainty=0 . 5861 (Affirmative) • 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 
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The protein has homology with the following sequences in the GENPEPT database. 

• >GP:CAB76822 GB:AJ276232 PSR protein [Enterococcus faecalis] 

Identities = 143/409 (34%) , Positives = 206/409 (49%) , Gaps = 56/409 (13%) 





48 


QRRTESPP--TMSYYEEPYSDSYYQDDDFYSEPQLTSQGLPIYQEERAPKKKKQRARKEK 


105 






+ R E P S E Y DSY +D T G ++ P+ KK + K+K 




Sbjct: 


31 


EHREEEPEEIAESLQEPVYEDSYTEDSRRSERRHQTDSGGG-NGSDQPPRGKKDKKPKKK 


89 


Query: 


106 


QRVKMftPFPPKAITPPRKKKKFKGFLKFlGIlI^IVlSGMVFMFVKGMRDVHNGKSHYS 


165 






RKK K K F K++ I+L+++ + MF+KG + S 




Sbjct: 


90 






Query: 


166 


PAIIEDFKGKDAVDGT-NILILGSDKRVSERSTDMTDTIWANVGNICDNKVKMVSFMRD 


224 






+E F G + +G NILILGSD R + R DTIMV + K K++SFMRD 




Sbjct: 


132 


QEKVETFNGVKSSNGAKNILILGSDTRGEDAG PADTIMVLQLNGPSKKPKLISFMRD 


188 



Query: 225 LLWIPNYSTEGYTOMKimSFNLGEQDiraKGAEYVRQTLK3raFDlDIKYYVMVDFETFA 284 

V+IP G K+NA+4 G GAE VR+TLK +F++D KYY VDF++F 

Sbjct: 189 TFVDIP GVGPNKINAAYAYG GAELWETLKQNFNLDTKYYAKVDFQSFE 237 

Query: 285 DAIDTLFPNGVKINAKFGLVGGQSADSVKVPDDLRMKHGWPSQKIKVG1QYMDGRTLLN 344 

+D++FP GVKI+A+ L + D V 1+ G Q MDG LL 

Sbjct: 238 KIVDSMFPKGVKIDAEKSL NLDGVD IEKGQQVMDGHVLLQ 277 



Sbjct: 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3949> which encodes the amino acid 
sequence <SEQ ID 3950>. Analysis of this protein sequence reveals the following: 

Possible site: 49 



>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.96 Transmembrane 159 - 175 ( 152 - 180) 

40 Final Results 

bacterial membrane Certainty=0 .4185 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB76822 GB:AJ276232 PSR protein [Enterococcus faecalis] 
Identities = 140/345 (40%), Positives = 195/345 (55%), Gaps = 41/345 (11%) 

Query: 140 PRSQK RKHKKKGCMKWFFNILGLLLMTVLMGLGLMFAKGVFDISTNKANYJCPAVSQ 195 

PR +K +K +KK K FF L +LL+ + +MF KG + + + V + 

Sbjct: 78 PRGKmKKPKKKRKKSKTKRFFKWLVILLILLFAYSTVMFLKGKSAAEHDDSLPQEKV-E 136 

Query: 196 AFDGQETQDGT-NILILGSDQRVTQGSTDARTDTIMWNVGNHflKKIKMVSFMRDTLINI 254 

F+G ++ +G NILILGSD T+G R DTIMV+ + +KK K++SFMRDT +4l 
Sbjct: 137 TFNGVKSSNGAKNILILGSD TRGEDAGRACTIMVLQLNGPSKKPKLISFMRDTFVDI 193 

Query: 255 PGYSYNDNSYDLKIjNSAFNLGEQEDHHGAEYVRFALKHNFDIDIKyYVMVDFETFAEAID 314 

PG N K+N+A+ G GAE VR LK NF++D KYY VDF++F + +D 

Sbjct: 194 PGVGPN KINAAYAYG GAELVRETLKQNFNLDTKYYAKVDFQSFEKIVD 241 

Query: 315 TLFPNGVKIDAKFATVGGVAVDSVEVPDDLRMKNGWPNQTIEVGEQRMDGRTLLNYARF 374 

++FP GVKIDA+ + + +D V+ IE G+Q MDG LL YARF 

Sbjct: 242 SMFPKGVKIDAEKS LNLDGVD -IEKGQQVMDGHVLLQYARF 281 
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Query: 375 RKDDEGDFGRTVRQQQVMSAVMSQIKDPTKLFTGSAAIGKIYALTSTNVSFPFWKNGVS 434 

R D+EGDFGR RQQQVMSAVMSQ+K+P L ++GK+ ST+V F++ NG S 

Sbjct: 282 RMDEEGDFGRTORQQQVMSAVMSQMKNPMTLLRTPESLGKLVGYMSTDVPVSFMLTNGPS 341 

Query: 435 VLGSGKNGVEHVTIPENGDWVDEYDMYGGQALYIDFDKYQKTLAK 479 

+h GK GVE +++P W YGL+DK +K 

Sbjct: 342 LLIKGKTGVBSLSVPVPDSMNFGESSYA3SILEVDEQKNADAIEK 386 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 273/486 (56%) , Positives = 340/486 (69%) , Gaps = 32/486 (6%) 
Query: 1 MSR]^GQI^HEELRYNYLLKNIHYLNEREra4EFQYLHYKKTAVRPQRRTESPPTNSYY 60 



Query: 61 EEPY-SDSYY QDDDFYSEPQLTSQGLPIYQEERAPKKKXQRARKEKQRVKV 110 

+EPY D YY +DDD + GLPIY +E KK K R + 

Sbjct: 61 QEPYFEDDYYNDYSPNDLLEDDDVNHDSSFVPYGLP1YPKEDRYIJSIKKT---KLTARRPI 117 

Query: 111 MAPFP PKAITPPRKKIOC-FKGFLKFIGIILLIVLSGMVFMFVK 152 

AP P P++ KKK K F +G++L+ VL G+ MF K 

Sbjct: 118 DAPQPIDEDDAFLTESVARCALPRSQKRKHKKKGCMKWFFNILGLLLMTVLMGLGLMFAK 177 

Query: 153 GMRDVNNGKSHYSPAIIEDFKGKDAVDGTNILILGSDKRVSERSTDARTDTIMVANVGNK 212 

G+ D++ K++Y PA+ + F G++ DGTNILILGSD+RV++ STDARTDTIMV NVGN 
Sbjct: 178 GVFDISTNKANYKPAVSQAFDGQETQDGTNILILGSDQRVTQGSTDARTDTIMVVNVGNH 237 

Query: 213 DNKVKMVSFMRDLLVNIPNYS-TEGYYDMiamSFN^^ 271 

K+KMVSFMRD L+NIP YS + YD+KLN++FNLGEQ+4-H GAEYVR+ LK++FDID 
Sbjct: 238 AKI<!II<2WSFMRDTLINIPGYSYHDNSYDLK]jNSAFNLGEQEDIfflGAEYVRRALKHNFDID 297 

Query: 272 IKyYVMVDFETFADAIDTLFPNGWINAKFGLVGGKJSADSVKVPDDLRMKNGWPSQKIK 331 

1KYYVMVDFETFA+AIDTLFPNGVKI+AKF VGG + DSV+VPDDLRMKNGWP+Q 1 + 
Sbjct: 298 IKYYVMVDFETFAEAIDTLFPHGVKIDAKFATVGGVAVDSVEVPDDLRMKNGVVPNQTIE 357 

Query: 332 VGIQYMDGRTLLNYARFRKDDDGDFGRTQRQQQVMRAIVSQIKDPRRLFTGSAAIGKAYA 391 

VG Q MDGRTLLNYARFRKDD+GDFGRT RQQQVM A++SQIKDP +LFTGSAAIGK YA 
Sbjct: 358 VGEQRMDGRTLDNYARFRKDDEGDFGRTVRQQQVMSAVMSQIKDPTKLFTGSAaiGKIYA 417 

Query: 392 LTSSNLSYSFVLTDGIPILSDAKNGIKQMTIPREGDWVDDYDQYGGQGLTIDFAKYKKIL 451 

LTS+N+S+ FV+ +G+ +L KNG++ +TIP GDWTO+YD YGGQ L IDF KY+K L 
Sbjct: 418 LTSTIWSFPFVVKNGVSVLGSGKMGVEHVTIPENGDWvDEYDMYGGQALYIDFDKYQKTL 477 

Query: 452 KKMGLR 457 
K+GLR 

Sbjct: 478 AKLGLR 483 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1285 

A DNA sequence (GBSxl362) was identified in S.agalactiae <SEQ ID 3951> which encodes the amino 
acid sequence <SEQ ID 3952>. This protein is predicted to be shikimate kinase (aroK). Analysis of tins 
protein sequence reveals the following: 

i cleavable K-term signal seq. 

■ Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA551B1 GB:X78413 shikimate kinase [Lactococcus lactis] 
Identities = 65/164 (39%), Positives = 98/164 (59%), Gaps = B/164 (4%) 

Query: 1 MPKVLLGFMGVGKTSVANCLENEVIDMDSLIEKHIGMSISRFFTEEGEASFRALESQFLN 60 

M +L+GFMG GK++VA L E D+D LIE+ I M 1+ FF GEA FR +E++ 
Sbjct: 1 MSIILIGFMGAGKSTVAKLLAEEFTDU3KLIEEEIEMPIATFFELFGEADFRKIENEVFE 60 

Query: 61 ELLKKKNEGLVIASGGGI VLLEENRRLLTLNRHNNI L -LTGS FE VLYHRI KKDEKNRRPL 119 

++K ++IA+GGGI+ E + L L+R + ++ LT F+ L+ RI D +N RP 
Sbjct: 61 LAVQK DI I IATGGGI I - - ENPKNLNVLDRASRWFLTADFDTLWKRISMDWQNVRP- 114 

Query: 120 FLNHSKEEFYDIYQKRMIjLYSGLSDMIIDTDYLTPQKIATVIGE 163 

L KE +++KRM YS ++D+ ID +P++IA I E 
Sbjct: 115 -LAQDKEAAQLLFEKRMKDYSLVADLTIDVTDKSPEQIAEQIRE 157 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3953> which encodes the amino acid 
sequence <SEQ ID 3954>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA55181 GB:X78413 shikimate kinase [Lactococcus lactis] 
Identities = 63/160 (39%) , Positives = 97/160 (60%) , Gaps = 5/160 (3%) 

Query: 1 MTKVLLGFMGVGKTTVSKHLSMHCKDMDAIIEAKIGMSIAAFFEQHGEIAFRTIESQVLK 60 

M+ +L+GFMG GK+TV+K L+ D+D +IE +1 M IA FFE GE FR IE++V + 
Sbjct: 1 MSIILIGFMGAGKSTVAKLLAEEFTDLDKLIEEEIEMPIATFFELFGEADFRKIENEVFE 60 

Query: 61 DLLFANDNSIIVTGGGVVVLQENRQLLRKNHQHNILLVASFETLYQRLKHDKKSQRPLFL 120 

L +11 TGGG++ +N +L + + L A F+TL++R+ D ++ RP L 

Sbjct: 61 --LAVQKDIIIATGGGIIF^P[^NIjNVLDR-ASRWFLTADFDTLWKRISMDWQNVRP--L 115 

Query: 121 KYSKEAFYEFYQQRMVFYEGLSDLVIRVDHRTPEEVANII 160 

KEA +++RM Y ++DL I V ++PE++A I 
Sbjct: 116 AQDKEARQLLFEKRMKDYSLVADLTIDVTDKSPEQIAEQI 155 

An alignment of the GAS and GBS proteins is shown below. 

Identities - 88/161 (54%) , Positives = 120/161 (73%) , Gaps = 1/161 (0%) 

Query: 1 MPKVLLGFMGVGKTSVANCLENEVIDMDSLIEKHIGMSISRFFTEEGEASFRALESQFLN 60 

M KVLLGFMGVGKT+V+ L DMD++IE IGMSI+ FF + GE +FR +ESQ h 

Sbjct: 1 MTK^LLGFMGVGKTTVSKHLSMHCKDMDAIIEAKIGMSIAAFFEQHGEIAFRTIESQVLK 60 

Query: 61 ELLKKKNEGLVIASGGGIVLLEENRRLLTI^JRHA^ILLTGSFE VLYHRI KKDEKNRRPLF 120 

+LL N+ +1 +GGG+V+L+ENR+LL N +NILL SFE LY R+K D+K++RPLF 
Sbjct: 61 DLLFA-KENSIIVTGGGWVLQENRQLLRKNHQHNILLVASFETLYQRLKHDKKSQRPLF 119 

Query: 121 LNHSKEEFYDIYQKRMLLYSGLSDMIIDTDYLTPQKIATVI 161 

L +SKE FY+ YQ+RM+ Y GLSD++I D+ TP+++A +1 
Sbjct: 120 LKYSKEAFYEFYQQRMVFYEGLSDLVI RVDHRTPEEVANI I 160 

SEQ ID 3952 (GBS152) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 25 (lane 2; MW 20kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 37 (lane 2; MW 45 .5kDa). 



WO 02/34771 



-1422- 



PCT/GB01/04789 



Based on this analysis, it was predicted that fliese proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1286 

A DNA sequence (GBSxl363) was identified in S.agalactiae <SEQ ID 3955> which encodes the amino 
acid sequence <SEQ ID 3956>. This protein is predicted to be 3-phosphoshikimate 1- 
carboxyvinyltransferase (aroA). Analysis of this protein sequence reveals the following: 

Possible site: 3 9 

»> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -1.81 
INTEGRAL Likelihood = -0.06 



Final Results 

bacterial membrane Certainty=0 . 1723 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9673> which encodes amino acid sequence <SEQ ID 9674> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD45819 GB:AF169483 5 -enolpyruvylshikimate-3 -phosphate synthase 
(Streptococcus pneumoniae] 
Identities = 288/426 (67%) , Positives = 347/426 (80%) 





5 


Sbjct: 


1 






Sbj Ct : 


61 




125 


Sbjct: 


121 




185 


Sbjct: 


181 




245 


Sbjct: 


241 


Query: 


305 


Sbjct: 


301 


Query: 


365 


Sbjct: 


361 




425 


Sbjct: 


421 



PMDR+ LPL KMG ISG T RDLPPL+L+GTK L+PI Y LP+ASAQVKSAL+ FAALQ 



TRNHTEDM++QFGGHL + K+I + G Q L GQ + VPGDISSAAFW+ 



VAGLI PNS ++L+NVGINETRTGI+DV+ MGGK++++ H-D KSATL V+ S L+ T 



I GA+IPRLIDELPI IALLATQAQG TVI DA+ELKVKETDRIQW ++L MGADIT 



TADGMII+G + LH A ++ GDHRIGMM AIAALLV +GEV+L EAINTSYP+F -t 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3957> which encodes the amino acid 
sequence <SEQ ID 3958>. Analysis of this protein sequence reveals the following: 

Possible site: 36 



WO 02/34771 



PCT/GB01/04789 



Final Results 

bacterial membrane Certainty=0 . 1871 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Cer tainty= 0 . 0 0 0 0 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAD45819 GB:AF169483 5-enolpyruvylshiIcimate-3-phosphate synthase 
[Streptococcus pneumoniae] 
Identities = 278/426 (65%) , Positives = 346/426 (80%) 

MKLRTNAGPLQGTIQVPGDKSISHRAVILGAVAKGETRVXGLLKGEDVLSTIQAFRNLGV 63 
MKL+TN L G I+VPGDKSISHR++I G++A+GET+V +L+GEDVLST+Q FR+LGV 
MKLKTNIRHLHGI IRVPGDKS ISHRS 1 1 FGSIAEGETKVYDILRGEDVLSTMQVFRDLGV 6 0 



f ++G G GL AP LNMGNSGTS+RLI-i-G+IAS F V+M GD+SLSKR 



PMDR+ PDK+MGV ISG+T+R PPL+L+G +NL+PI Y LPI+SAQVKSA++ AALQA 



KG + ++EKE TRNHTE+M+QQFGG L VDGK+IT+ GPQ+LT Q++ VPGDISSAAFWL 



I G LIPRLIDELPIIALLATQAQG T IKDA+EL+VKETDRIQW D LNSMGA+I 



TADGMIIKG + L+GA +T+GDHRIGMMTAIAALLV G+V LD+ EAI TSYP+FF D 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 269/424 (63%) , Positives = 331/424 (77%) 

Query: 5 MKLLTNANTLKGTIRVPGDKSI SHRAI I FGS I SQGVTRI VDVLRGED VLSTIEAFKQMGV 64 

MKL TNA L+GTI+VPGDKSISHRA+I G++++G TR+ +L+GEDVLSTI+AF+ +GV 
Sbjct: 4 MKLRTNAGPLQGTIQVPGDKSISHRAVILGAVAKGETRVKGLLKGEDVLSTIQAFRNLGV 63 

Query: 65 DIEDDGEIITIYGKGFAGLTQPIOTLLDMGNSGTSMRLIAGVLAGQEFEVTMVGDNSLSKR 124 

IE+ + + I G+GF GL P Ii+MGNSGTSMRLIAG+LAGQ F V M+GD ELSKR 
Sbjct: 64 RIEEKDDQLVIEGQGFQGLNAPCQTLNMGNSGTSMRLIAGLLAGQPFSVKMIGDESLSKR 123 

Query: 125 P^RIALPLSKMGARISGVTISIRDLPPDKLQGTKKLKPIFYHLPVASAQVKSALIFAALQT 184 

PMDRI PL +MG ISG T+R PPL+LQG + L+PI Y LP++SAQVKSA++ AALQ 
Sbjct: 124 PMDRIWPLKQMGVEISGETDRQFPPIiQLQGNRNLQPITYTLPISSAQVKSAILLAALQA 183 

Query: 185 KGESLIVEKEQTRNHTEDMIRQFGGHLDIKDKEIRLNGGQSLVGQDIRVPGDISSAAFWI 244 

KG + +VEKE TRNHTE+MI +QFGG L + KILGQL Q+I VPGDISSAAFW+ 
Sbjct: 184 KGTTQWEKEITRNHTEEMIQQFGGRLIVDGKRITLVGPQQLTAQEITVPGDISSAAFWL 243 







Sbjct: 


1 




64 


Sbj ct: 


61 






Sbjct: 


121 


Query: 


184 


Sbjct: 


181 


Query: 


244 


Sbjct: 




Query: 


304 


Sbjct: 


301 


Query: 


364 


Sbjct: 


361 




424 


Sbjct: 


421 



65 Query: 245 VAGLIIPNSHIILENVGINETRTGILDWSKMGGKIKLSSVDNQVKSATLTVDYSHLQAT 304 

VAGLIIP S ++L+NVG+N TRTGIL+W KMG +1 ++ + +++ V YS+++ T 
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Sbjct: 244 VAGLIIPGSELLLIOWGVNPTRTGILEVVBKMGAQIVYEDMHKKEQVTSIRVVySMMKGT 303 

Query: 305 HISGAMIPRLIDELPIIALlATQAQGTTVIADAQEZjKVICETDRIQVVVSSLKQMGiiDITA 364 

ISG + 1 PRL I DELP I I ALLATQAQGTT I DAQEL+VKETDRIQW + L MGA+I A 
Sbjct: 304 IISGGLIPRLIDELPIIALLATQAQGTTCIKDAQELRVKETDRIQWTDILNSMGANIKA 363 

Query: 365 TADGMIIRGNTPLHAASLDCHGDHRIGMMIAIAALLVKEGEA/DLSGEEAINTSYPHFLEH 424 

TADGMII+G T L+ A+ +GDHRIGMM AIAALLVK+G+V L EEAI TSYP F + 
Sbjct: 364 TADGMIIKGPTVLYGANTSTYGDHRIGMMTAXAALLVKQGQVHLDKEEAIMTSYPTPFKD 423 

Query: 425 LEGL 428 
LE L 

Sbjct: 424 LERL 427 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1287 

A DNA sequence (GBSxl364) was identified in S.agalactiae <SEQ ID 3959> which encodes the amino 
acid sequence <SEQ ID 3960>. Analysis of this protein sequence reveals the following: 

Possible site: 3 8 

»> Seems to have an uncleavable N-term signal seq 

Likelihood = -1.12 Transmembrane 6 - 22 ( 6-22) 



Final Results 

bacterial membrane --- Certainty=0 . 1447 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF20148 GB:AF208390 actinin-like protein [Entamoeba 
histolytica] 

Identities = 62/236 (26%) , Positives = 107/236 (45%) , Gaps = 38/236 (16%) 

Query: 144 NYNSTNSSNPESMLFYEKQLKTWLSTH KNYYLDYK--VTPIYQNNELIPRKIELK- 196 

N N + N + + L W+++ N+ D+K V + + +1+ + 

Sbjct: 116 NANQQKNVmKEEvVEI^ALIjDWvNSFGI^ 175 

Query: 197 YVGIDKTGKLLPIFIGNKSTQDQFGI STVTLENTSPNATIDYLSGKAQN 245 

+ G+D T ++ K +QF 1 + E P + + Y+S + 

Sbjct: 176 FSGLDNTQMVIDC QKLAYEQFKI PILMDVKBLVCERPDPKSIMTYVSVYKERYEQLL 232 

Query: 246 TVLSAKEQRKL IAKHEEEKRLAEK KVEEEKAAAE7QKKL - EEEQARLAAEAQ - RK 298 

KE+++ IA+ E+E++ E+ + E+E+ A E Q++L EEQ RLA E Q RK 

Sbjct: 233 VEKEQKEEQERIAREEQERKQKEEQERLAREEQERLAREEQERLAREEQERIjAREEQERK 292 

Query: 299 QKEEQARLAAETQKKQETIjVQEQTSQGYKRDYRGRWHRPNGQYASKAEIAAAGLQW 354 

QKEEQ RLA E Q++++ QE+ 4Q +P Q + + AA W 

Sbjct: 293 QKEEQERLAREEQERKQREEQERLNQ QQPTSQQLTFFSVQAAADAW 338 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3961> which encodes the amino acid 
sequence <SEQ ID 3962>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:CAA03161 GB:A49208 unnamed protein product [Streptococcus 
pyogenes] 

Identities = 54/222 (24%), Positives = 93/222 (41%), Gaps = 39/222 (17%) 

Query: 44 HYKNTVSSKLLP--FTANYQLQLGELDNLNRA TFSHIQLQDRHETKDVRTKINYD 96 

+YK +S++ PF + +LDLR T ++++ + + KN + 

Sbjct: 76 YYKTLGTSQITPALFPICAGDILYSKLD3LGRTRTARGTLTYANVEGSYGVRQSFGK-NQN 134 

Query: 97 PVGWHN YQFPYGDG-SKSSWVMTHlGHLVGYQFCGIiISIDEPRNLVAMTAWLNTGAY 149 

P GW Y+ + +G S NR HL+ G + + + A T 

Sbjct: 135 PAGWTGNPNHVKYKIEWmGLSYVGDFWNRSHLIADSLGG DALRVNAVTGTRTQ 188 

Query: 150 SGANDSNPEGMLYYE^LDSWLAMPDFW^YKVTPIYSGNEVVPRQIELQYVGIDSSGE 209 

+ GM Y E R WL + D +h Y+V PIY+ +E++PR + 
Sbjct: 189 NVGGRDQKGGMRYrEQRAQEWLEANRDGYLYYEVAPIYMADELIPRAV 236 

Query: 210 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 117/245 (47%) , Positives = 166/245 (67%) , Gaps = 4/245 (1%) 

Query: 2 KRKQFIKLGIATLLWISLYTPINIATOTTTFJS1IVTAQEY--KTKENGTLPFKHKRQLVL 59 

K+K + + LL++ ++ A T N+ A + T + LPF QL L 

Sbjct: 5 KQKASLLTAvIiLLSLSITTITVDAARWTYP^^ 64 

Query: 60 GELDDKGRATFAHIQLKVKDEPKKKKVKRLKTTPVGWHNFKPYYITOGTQKAWIjMSRGRLI 119 

GELD+ RATF+HIQL+ + E K R K + PVGWHN++F Y DG++ +W+M+RG L+ 
Sbjct: 65 GELDI>ttNRATFSHIQLQDRHETKI)VRTK-INYDPVGWHigYQFPYGDGSKSSW^WRGHLV 123 

Query: 120 CHQFSGLMStERKNLVPMmflOT^ 179 
+QF GLN+E +NLV MT WLHTG Y+ N SNPE ML+YE +L +WL+ H +++LDYKV 

Sbjct: : 



Query: 180 TPIYQMNEL1PRKIELKYVGIDKTGKLLPIFI-GNKSTQDQFGISTWLENTSPNATIDY 238 

TPIY NE++PR+IEL+YVGID +G+LL I + NK + D+ G++TV LEN++PN +DY 
Sbjct: 184 TPIYSGNEWPRQIELQYVGIDSSGELLTIRLNSNKESIDENGVTTVILENSAPNINLDY 243 

40 

Query: 239 LSGKA 243 

L+G A 
Sbjct: 244 LNGTA 248 



45 A related DNA sequence was identified in S.pyogenes <SEQ ID 7263> which encodes amino acid sequence 
<SEQ ID 7264>. An alignment of the GAS and GBS sequences follows: 
Score = 58.9 bits (140), Expect = 2e-ll 

Identities = 34/103 (33%) , Positives = 55/103 (53%) , Gaps = 1/103 (0%) 

50 Query: 1 MPFKTNLKAGILLYAMFMASIFLLvXQVYLSQVTALHKEYQAQTDYVKARLIAEIVYQD- 59 

M K LKAGILL A+ +A++F LVLQ YL+++ A ++Y +Q + KA L A++ Y+ 
Sbjct: 1 MILKKKLKAGILLQAIVLAAVFTLVLQFYLARILATERQYHSQIEASKAYLTAQLAYKTI 60 

Query: 60 HRYKASNPVFFKGGQVICRERKERWMLIVKLDQQRQYQFEYLK 102 
55 S +F GG + + V LD+ Y ++ + 

Sbjct: 61 EGDSISGKCYFTGGYASYLQEGNYIiQVKvTLDKGGNYNHKFYR 103 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1288 

A DNA sequence (GBSxl365) was identified in S.agalactiae <SEQ ID 3963> which encodes the amino 
acid sequence <SEQ ID 3964>. This protein is predicted to be enolase (eno). Analysis of this protein 



io N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3025 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA81815 GB:AB029313 enolase [Streptococcus intermedius] 
Identities = 396/435 (91%), Positives = 414/435 (95%), Gaps = 1/435 (0%) 

Query: 1 MSIITDVYAREVLDSRGNETLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRYG 60 

MSIITDVyAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRYG 
Sbjct: 1 MSIITDVYAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRYG 60 

Query: 61 GLGTQKAVDNVNNVIAEA1 IGYDVRDQQAIDRAMIALDGTPNKGKLGANAILGVS IAVAR 120 

GLGTQKAVDNVNN+ IAEA+ IGYDVRDQQAIDRAMIALDGTPNKGKLGANAILGVS IAVAR 
Sbjct: 61 GLGTQKAVDNVNNIIAEAVIGYDWDQQAIDRAMIALDGTPNKGKLGANAILGVSIAVAR 120 

Query: 121 AARDYLEVPLYSYLGGFNTKVLPTPMMNIINGGSHSDAPIAFQEFMIMPVGAPTFKEALR 180 

AAADYLE+PLYSYLGGFNTKVLPTPMMNIINGGSHSDAPIAFQEFMI+P GAPTFKEALR 
Sbjct: 121 AAADYLEIPLYSYLGGFNTKVLPTPMMNIINGGSHSDAPIAFQEFMIVPAGAPTFKEALR 180 

Query: 181 WGAEVFHALKKILKERGLETAVGDEGGFAPKFEGTEDGVETILKAIEAAGYEAGENGIMI 240 

WGAE+FHALKKILK RGL TAVGDEGGFAP+F+GTEDGVETIL AIEAAGY G++ + + 
Sbjct: 181 WGAEIFHALKKILKSRGLATAVGDEGGFAPRFDGTEDGVETILAAIEAAGWPGKD-VFL 239 

Query: 241 GFDalSSEFYDAERKVYDYSKFEGEGGAVRTAAEQIDYLEELVNKYPII^IEEG^IDENDW 300 

GFDCASSEFYD ERKVYDY+KFEGEG AVRTA EQIDYLEELVNKYPI ITIEDGMDENDW 
Sbjct: 240 GFDCASSEFYDKERKVYDYTKFEGEGAAWTADEQIDYLEELVNKYPI ITIEDGMDENDW 299 

Query: 301 DGWKALTERLGGRVQLVGDDFFVTNTDYLRRGIKEFJLANSILIKVNQIGTLTETFEAIEM 360 

DGWK LTERLG +VQ VGDDFFVTNT YL +GI E ANSILIKVNQIGTLTETF+AIEM 
Sbjct: 300 DGWKKLTERLGKKVQPVGDDFFVTNTSYLEKGINEACANSILIKVNQIGTLTETFDAIEM 359 

Query: 361 AKEAGYTAWSHRSGETEDSTIADIAVATNAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 420 

AKEAGYTAWSHRSGETEDSTIADIAVA NAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 
Sbjct: 360 AKEAGYTAWSHRSGETEDSTIADIAVAANAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 419 

Query: 421 VAQYKGIKSFYNLKK 435 

VA+Y+G+KSFYNL K 
Sbjct: 420 VAEYRGLKSFYNLSK 434 

Proteins in the glycolysis/gluconeogenesis pathway have been experimentally detected on the surface of 
Streptococci. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3965> which encodes the amino acid 
sequence <SEQ ID 3966>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3025 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:BAA81815 GB:AB029313 enolase [Streptococcus intermedius] 
Identities = 396/435 (91%), Positives = 415/435 (95%), Gaps = 1/435 (0%) 

Query: 1 MSIITDVYAREVLDSRGNPTLEVEVYTESGAFGRGIvlVPSGASTGEHEAVELRDGDKSRyL SO 

MSIITDVYAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRY 
Sbjct: 1 MSIITDVYAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEA.VELRDGDKSRYG SO 

Query: 61 GLGTQKAVDNVNNIIAEAIIGYDVRDQCAIDRAMIALIDGTPNKGKLGANAILGVSIAVAR 120 

GLGTQKAVDNVNNIIAEA+IGYDVRDQQAIDRAI'IIALDGTPNKGKLGANAILGVSIAVAR 
Sbjct: 61 GLGTQKAVDNVNNI IAEAVIGYDVRDQQAIDRAMIALDGTPNKGKLGANAILGVS1AVAR 120 

Query: 121 AAADYLEVPLYTYLGGFNTKVLPTPNMKIINGGSH3DAPIAFQEFMIMPVGAPTFKEGLR 180 

AAADYLE+PLY+YLGGFNTKVLPTPMMNIINGGSHSDAPIAFQEFMI+P GAPTFKE LR 
Sbjct: 121 AAADYLEIPLYSYLGGFNTKVLPTPMMNIINGGSHSDAPIAFQEFMIVPAGAPTFKEALR 180 

Query: 181 WGAEVFHALKKILKERGLVTAVGDEGGFAPKFEGTZDGVETILKAIEAAGYEAGENGIMI 240 

WGAE+FHALKKILK RGL TAVGDEGGFAP+F+GTEDGVETIL AIEAAGY G++ + + 
Sbjct: 181 WGAEIFHALKK1LKSRGLATAVGDEGGFAPRFDGTEDGVETILAAIEAAGYVPGKD-VFL 239 

Query: 241 GFDCASSEFYDKERKVYDYTKFEGEGAAVRTSAEQVDYLEELVNKYPIITIEDGMDENDW 300 

GFDCASSEFYDKERKVYDYTKFEGEGAAVRT+ EQ+DYLEELVNKYPIITIEDGMDENDW 
Sbjct: 240 GFDCASSEFYDKERKVYDYTKFEGEGAAVRTADEQIDYLEELVNKYPIITIEDGMDENDW 299 

Query: 301 DGWKVLTERLGKRVQLVGDDFFTCNTEYLARGIKENAANSILIKVNQIGTLTETFEAIEM 360 

DGWK LTERLGK+VQ VGDDFFVTNT YL +GI E ANSILIKVNQIGTLTETF+AIEM 
Sbjct: 300 DGWKKLTEPiGKKVQPVGDDFFVTNTSYLEKGINEACANSIIjIKVNQIGTLTETFDAIEM 359 

Query: 361 AKEAGYTAWSHRSGETEDSTIADIAVATNAGQIKTGSLSRTDRIAKYNQLIiRIEDQLGE 420 

AKEAGYTAWSHRSGETEDSTIADIAVA NAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 
Sbjct: 360 AKEAGYTAWSIIRSGETEDSTIADIAVAANAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 419 

Query: 421 VAQYKGIKSFYNLKK 435 

VA+Y+G+KSFYNL K 
Sbjct: 420 VAEYRGLKSFYNLSK 434 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 421/435 (96%) , Positives = 427/435 (97%) 

Query: 1 MSIITDVYAREVLDSRGNPTLEVEvYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRYG 60 

MSIITDvYAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRY 
Sbjct: 1 MSIITDVYAREVLDSRGNPTLEVEVYTESGAFGRGMVPSGASTGEHEAVELRDGDKSRYL 60 

Query: 61 GLGTQKAVDNVMWIAFAIIGYDWDQQAIDRAMIALDGTPNKGKLGANAILGVSIAVAR 120 

GLGTQKAVDNVNN+IAEAIIGYDvRDQQAIDRAMIAiDGTPHKGKDGANAILGVSIAVAR 
Sbjct: 61 GLGTQKAVDNVNNIIAEAIIGYDVRDQQAIDRAMIALDGTPNKGKD3ANAILGVSIAVAR 120 

Query: 121 AAADYLEVPLYSYLGGFNTKVLPTPMMNIINGGSHSDAPIAFQEFMIMPVGAPTFKEALR 180 

AAADYLEVPLY+YLGGFNTKVLPTPMMNIINGGSHSDAPIAFQEFMIMPVGAPTFKE LR 
Sbjct: 121 AAADYLEVPLYTYLGGFNTKVLPTPMMN-INGGSHSDAPIAFQEFMIMPVGAPTFKEGLR 180 

Query: 181 WGAEVFEIALKKILKERGLETAVGDEGGFAPKFEGTEDGVETILKAIEAAGYEAGENGIMI 240 

WGAEVFHALKKI LKERGL TAVGDEGGFAPKFEGTEDGVETTLKAIEAAGYEAGENGIMI 
Sbjct: 181 WGAEVFHALKKILKERGLVTAVGDEGGFAPKFEGTEDGVETILKAIEAAGYEAGENGIMI 240 

Query: 241 GFDCASSEFYDAERKVYDYSKFEGEGGAVRTAAEQIDYLEELVNKYPIITIEDGMDENDW 300 

GFDCASSEFYD ERKVYDY+KFEGEG AVRT+AEQ+DYLEELVNKYPIITIEDGMDENDW 
Sbjct: 241 GFDCaSSEFYDKERKVYDYTKFEGEGAAWTSAEQVDYLEELVNKYPIITIEDGMDENDW 300 

Query: 301 DGWKALTERLGGRVQLVGDDFFVTNTDYLARGIKEEAANSILIKVNQIGTLTETFEAIEM 360 

DGWK LTERLG RVQLVGDDFFVTNT+YLARGIKE AANSILIKVNQIGTLTETFEAIEM 
Sbjct: 301 DGWKVLTERLGKRVQLVGDDFFVTOTEYl^GIKENAANSILIKVNQIGTLTETFEAIEM 360 
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Sbjct: 361 AKERGYTAWSHRSGETEDSTIADIAVATNAGQIKTGSLSRTDRIAKYNQLLRIEDQLGE 420 

Query: 421 VAQYKGIKSFYNLKK 435 
VAQYKGIKSFYNLKK 
5 Sbjct: 421 VAQYKGIKSFYNLKK 435 

SEQ ID 3964 (GBS31 1) was expressed in E.eoli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 45 (lane 3; MW 51kDa). 

GB S3 11 -His was purified as shown in Figure 203, lane 10. 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1289 

A DNA sequence (GBSxl366) was identified in S.agalactiae <SEQ ID 3967> which encodes the amino 
acid sequence <SEQ ID 3968>. Analysis of this protein sequence reveals the following: 

15 Possible site: 60 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1998 (Affirmative) < suco 

20 bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

25 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1290 

A DNA sequence (GBSxl367) was identified in S.agalactiae <SEQ ID 3969> which encodes the amino 
acid sequence <SEQ ID 3970>. This protein is predicted to be di-/tripeptide transporter. Analysis of this 
30 protein sequence reveals the following: 



Possible site: 54 



»> Seems to 


rave no N-terminal signal sequence 
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Likelihood =-14.33 Transmembrane 
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INTEGRAL 
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117 


133 


( 110 


141) 


INTEGRAL 


Likelihood = -8.44 Transmembrane 
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INTEGRAL 


Likelihood = -5.84 Transmembrane 


19 


35 
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INTEGRAL 


Likelihood = -3.08 Transmembrane 
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( 151 


167) 


INTEGRAL 


Likelihood = -2.55 Transmembrane 


264 


280 


( 264 


281) 


INTEGRAL 


Likelihood = -2.28 Transmembrane 


44 


60 


( 44 


60) 


INTEGRAL 


Likelihood = -2.02 Transmembrane 


238 


254 


( 238 


255) 



Final Results 

bacterial membrane Certainty=0. 6731 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9395> which encodes amino acid sequence <SEQ ID 9396> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB12175 GB:Z99106 similar to di-tripeptide ABC transporter 
(membrane protein) [Bacillus subtilis] 
Identities = 175/359 (48%) , Positives = 254/359 (70%) , Gaps = 9/359 (2%) 



Query. 




MVGNLYGENDSRRDAGFS I FVFGINLGAF I SPIVVGYLGQEVHFHLGFSLAAIGMFFGLL 


60 






+VG+LY + D RRD+GFSIF GINLG ++P++VG LGQ+ N+HLGF AA+GM GL+ 




Sbjct: 


142 


WGDLYTKEDPRRDSGFSIFYMGINLGGLIjAPLIVGTLGQKYNYHLGFGAAAVGMLLGLI 


201 




61 


QYTLDGKKYLTEESLRPJSTOPLSPEEKSSLYKKVGLILIGIVIVLILLHLMHMLTIEVIID 


120 






+ L KK L +PLS +KS++ +G+I++ I +++ + +LTI+ ID 




Sbjct: 


202 


VFPLTRKKNLGLAGSNVPNPLS - - KKSAIGTGIGVI IVAI AVI ISVQ - - TGVLTI KRFID 


257 


Query. 




IFSIIAIAIPIIYFIKILSSKKISSVERSRVWAYIPLFIASILFWSIEEQGSWIALFAD 


180 






+ SI+ I IP+IYFI ■)- +SKK E+SR+ AY+PLFI +++FW+I+EQG+ +LA++AD 




Sbjct: 


258 


LVSILGILIPVIYFIIMFTSKKADKTEKSRIAAYVPLFIGAVMFWAIQEQGATILAVYAD 


317 




181 


EQTICLYLNFFGHHINFPSSYFQSMNPLFIMLYVPFFAWLWAKWGSKQPSSPKKFAYGLFF 


240 






E+ +L L F SS+FQS+NPLF++++ P FAWLW K G +QPS+P KF+ G+ 




Sbjct: 


318 


ERIRLSLGGF ELQSSWFQSLNPLFWIFAPIFAWLWMKLGKRQPSTPVKFSIGI1L 




Query: 


241 


AGASFLWMMLPGLLFGVNAKVSPLWLTMSWAIVIVGEMLISPVGLSATSKLAPKAFQAQM 


300 






AG SF+ M+ P + G A VSPLWL +S+ +V++GE+ +SPVGLS T+KLAP AF AQ 




Sbjct: 


374 


AGLSFIIMVFPAMQ - GKEALVS PLWLVLS FLL VVLGELCLS PVGLS VTTKLAPAAFSAQT 


432 




301 


MSIWFLSNAAaQAINAQIVKLYTPDTQTLYYGWGGITWFGFILLFYVPRIEKLMSGV 359 






MS+WFL+NAAAQAINAQ+ L+ 4-T+Y+G 4G I++V G ILL P I++ M GV 




Sbjct: 


433 


MSMWFLTNAAAQAINAQVAGLFDKI PETMYFGTIGLI S I VLGGILLLLS PVI KRAMKGV 491 



No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1291 

A DNA sequence (GBSxl369) was identified in S.agalactiae <SEQ ID 3971> which encodes the amino 
acid sequence <SEQ ID 3972>. Analysis of this protein sequence reveals the following: 

35 Possible site: 37 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1292 

A DNA sequence (GBSxl370) was identified in S.agalactiae <SEQ ID 3973> which encodes the amino 

acid sequence <SEQ ID 3974>. Analysis of this protein sequence reveals the following: 

50 Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2485 (Affirmative) < suco 
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- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF61315 GB:D96166 unknown [Streptococcus cristatus] 
Identities = 181/442 (40%), Positives = 270/442 (60%), Gaps = 2/442 (0%) 

Query: 1 MINLFDSYTQSSWDLHFSLIKSGYINPTIMNDDGFLPDDVTSPYLYYTGFAKTGAGRPL 60 

MI LFD Y Q+S+DL SL +G P + + DDG+L DV SPY Y+TG T GRP+ 
Sbjct: 1 MICLFDRYDQASFDLLRSLKATGLDCPWWQDDGYLSPDVBSPYSYFTGDLDTPEGRPI 60 

Query: 61 YYNELRVPDTWEIIGFSSGADIVDLGVKKGRIIYANPNHKRLIKEVDWFDEQGRVILKDR 120 

Y+N + P WEI + +I+D+G K+ I Y P H+R ++ V+W D +G+V D 
Sbjct: 61 YFNLVPKPHr,WEIRSSNVNGEILDMGKKRAKIFYRQPTH3RRVRAVEWLDTEGQVRAADI 120 

Query: 121 FNKFGFCFAQTFYNADGQAIQTSYYNKDRQEVISENHMTGDYILNDNNQFKVFKSKVEFV 180 

+N+ G FAQ Y+ + T Y+++ VI ENH+TGD IL + +FKSK EFV 
Sbjct: 121 YNRKGRLFAQITYDQTQRPTHTRYFDQSNVWIMENHLTGDIILTLEGKRHIFKSKQEFV 180 

Query: 181 INYLQEAKFNLDRIFYNSLSTPFLVSFYL- -NRLESKDVLFWQEPLVDDIPGNMRLLLNN 238 

+ YLQ ++ DRI YNSL4TPFLV++ L ++DVLFWQEP+ + +PGNM++ + 

Sbjct: 181 VFYLQYRGYDTDRIIYNSLATPFLVAYALRPKNGRAEDVLFWQEPIGEALPGNMKVAMKM 240 

Query: 239 PSPNTKIVIQSYEAYANAMRLLTDEEQKQVSFLGFMYPLKETEKLHNQALILTNSDQIEA 298 

P N +1 +Q + Y L T EE+ +G++Y + ++ +ALILTNSDQ+E 

Sbjct: 241 PHRNIRIAVQDRQVYEKIQSLATPEEKVYFHNIGYIYDYQRLNNMNPEALILTNSDQLEQ 300 

Query: 299 LESLOTSLPNLTFNIGALTEMSSDLMNFGKYDOTVLYPNITTNQIQYLSNICAFYLDINH 35B 

+E L+T LPN+ F+IGA+TEMS LM +Y NV LYPNI ++ L C YLDIN 
Sbjct: 301 IEQLLTQLPN\'HFHIGAITEMSGHLMGLNRYPNVSLYPNIRPAKVAELFERCDLYLDINI 360 



Query: 419 KALKQQLEDCHVSSSTQYQSVI 440 

AL +Q + + +S QY+++I 
Sbjct: 421 AALTRQKQAMStQASLEQYKAI I 442 

40 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
iiagnostics. 



Example 1293 

A DNA sequence (GBSxl371) was identified in S.agalactiae <SEQ ID 3975> which encodes the amino 
acid sequence <SEQ ID 3976>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.06 Transmentorane 405 - 421 ( 404 - 422) 



50 Final Results 

bacterial membrane Certainty=0 . 1022 (Affirmative) < suae: 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA94320 GB:AB033763 hypothetical protein [Staphylococcus 
aureus] 

Identities = 66/195 (33%) , Positives = 99/195 (49%) , Gaps = 9/195 (4%) 
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Query: 259 NYYDyQETNANRFDFFITSTDKQTELLEQQFKQFrNHNPRIITIPVGSID NLKMPM 314 

N Y + F N NR+ I ST +Q + N+ + TIPVG ID NLK 

Sbjct: 15 NTYKHVFNNLNRYSGI I VSTKQQ QLDISARINNEIPVHTIPVGYIDEHFTNLKRNN 70 



Query: 375 TEYIRLMG-HKNLSNVYQNYELYLTAEKSEGFGLTLLEAIGAGLPLIGFDWYGNQTFIK 433 
10 + L G +NLS Q+ + L S EGF L LLE I G+P +G++ +YG I 

Sbjct: 131 ENWVFLRGFRRNLSAEIQDAYMSLITSNMEGFNtiGLLETITEGIPPVGYNSKYGPSEIjlL 190 

Query: 434 DGENGYLIPRFDMDD 448 
+ ENGYLI + D D+ 
15 Sbjct: 191 NNENGYLINKNDKDE 205 

SEQ ID 3976 (GBS426) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 80 (lane 4; MW 58.8kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 173 (lane 3; MW 84kDa). 

20 GBS426-GST was purified as shown in Figure 220, lane 5. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1294 

A DNA sequence (GBSxl372) was identified in S.agalactiae <SEQ ID 3977> which encodes the amino 
25 acid sequence <SEQ ID 3978>. This protein is predicted to be preprotein translocase seca subunit (secA). 
Analysis of this protein sequence reveals the following: 
Possible site: 42 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.69 Transmembrane 75 - 91 ( 75 - 91) 



Final Results 

bacterial membrane Certainty=0 . 1277 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44957 GB:U56901 involved in protein export [Bacillus subtilis] 
Identities = 336/7S4 (42%) , Positives = 506/794 (63%) , Gaps = 29/794 (3%) 



FAWREA RV G+FP+ VQ++GG+ LH GN AEKKTGEGKTLT+T+P+YLNAL GKG 



++T N YLA RDAE+MGK++ FLGL+VG+ ++ 





5 


Sbj ct : 


6 




65 


Sbjct: 


66 




125 


Sbjct: 


126 




185 


Sbjct: 


181 




245 


Sbjct: 





LH+A++DE D++L+D A+TPL+ISG 



WO 02/34771 



PCT/GB01/04789 



Query: 305 ERGKDYVVDDGEI KLLDATOGRVLEGTKI^JGGVHQAI EQK2HLNVTPESRAMAS ITYQNL 364 

++ DYW+DG++ ++D+ GR+++G + G+HQAIE KE h + ES +A+IT+QN 
Sbjct: 301 QKDVDYWEDGQWIVDSFTGRLMKGRRYSEGLHQAIEAKEGLEIQNESMTLATITFQNY 360 

Query: 365 PRMFTKIAGMTGTGKTAEKEFIEVYDMEVVRIPTNSPVRRIDYPDKIYTTLPEKIHATIE 424 

FRM+ KLAGMTGT KT E+EF +Y+M+W IPTN PV R D PD IY T+ K A E 
Sbjot: 361 FRNKEKI^GMTGTAKTEEEEFRNIYNMQVVTIPTNRPVVRDDRPDLIYRTMEGKFKAVAE 420 

Query: 425 FVKQVHDTGQPILLraGSVRMSELJSELIJLLSGIPHSLLNAQSAVKEAQMIAEAGQKGAV 484 

V Q + TGQP+L+ +V SEL S+DL GIPH +LNA++ +KAQ+1 EAGQKGAV 
Sbjct: 421 DVAQRYMTGQPVLVGTVAVETSEL I S KLLKNKG I PHQVLNAKMHEREAQI IEEAGQKGAV 480 

Query: 485 TVATNMAGRGTDIKLGKGVSELGGIAVIGTERMKSQRMDLQLRGRSGRQGDIGFSQFFVS 544 

T+ATNMAGRGTDIKLG+GV ELGGLAV+GTER +S+R+D QLRGRSGRQGD G +QF++S 
Sbjct: 481 TIATNMAGRGTDIKLGEGVKELGGIAWGTEREESRRIDNQLRGRSGRQGDPGITQFYLS 540 

Query: 545 FEDDLMIESGPKWAQDYFRKNRDKVNPEKPKALGQRRFQKLFQQTQFJVSDGKGESARSQT 604 

ED+LM G + D+ + + + + + +Q+ +G +R Q 

Sbjct: 541 MEDELMRRFGAERTMAML DRFGMDDSTPIQSKMVSRAVESSQKRVEGNNFDSRKQL 596 

Query: 605 IEFDSSVQLQREYVYRERNALINGESGHFSPRQIIDTVISSFI AYLDGEVEKEEL 659 

+-H-D ++ QRE +Y++R +1+ E + R+I++ +1 S + AY E EE 

Sbjct: 597 LQYDDVLRQQREVIYKQRFEVIDSE NLRE I VENM I KS SLERA1 AAYTPREELPEE - 651 

Query: 660 IFEVNRFI - FDNMSYNLQGISKEMSL- -EEIKNYLFKIADEILREKHNLLGDSFG 711 

++++ + N +Y +G ++ + +E L I D 1+ K+N + FG 
Sbjct: 652 -WKLDGLVDLIKTTYLDEGALEKSDIFGKEPDEMLELIMDRII-TKYMEKEEQFGKEQMR 709 

Query: 712 DFERTAALKAIDEAWIEEVDYLQQLRTVATARQTAQRKPVFEYHKEAYKSYNIMKKEIRE 771 
+FE+ L+A+D W++ +D + QLR R AQ NP+ EY E + + M + I • + 

' Sbjct: 710 EFEKVIVLRAVDSKWMDHIDAMDQliRQGIHLRAYAQTNPLREYQMEGFAMFEHMIES IED 7 6 9 

Query: 772 QTFRNLLLSEVSFN 785 

+ + ++ +E+ N 
Sbjct: 770 EVAKFVMKAEIEKN 783 

There is also homology to SEQ ID 3620. 

SEQ ID 3978 (GBS425) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 80 (lane 3; MW 91kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 173 (lane 2; MW 1 16kDa). 

GBS425-GST was purified as shown in Figure 220, lane 4. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1295 

A DNA sequence (GBSxl373) was identified in S.agcdactiae <SEQ ID 3979> which encodes the amino 
acid sequence <SEQ ID 3980>. Analysis of this protein sequence reveals the following: 

3 N-terrainal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0.3B27 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogsnes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1296 

5 A DNA sequence (GBSxl374) was identified in S.agalactiae <SEQ ID 3981> which encodes the amino 
acid sequence <SEQ ID 3982>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N- terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 2683 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 10001> which encodes amino acid sequence <SEQ ID 
10002> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 1297 

A DNA sequence (GBSxl375) was identified in S.agalactiae <SEQ ID 3983> which encodes the amino 

acid sequence <SEQ ID 3984>. Analysis of this protein sequence reveals the following: 

, Possible site: 31 
25 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5410 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 1298 

A DNA sequence (GBSxl376) was identified in S.agalactiae <SEQ ID 3985> which encodes the amino 
acid sequence <SEQ ID 3986>. This protein is predicted to be preprotein translocase secy subunit. Analysis 
of this protein sequence reveals the following: 

40 Possible site: 59 

»> Seems to have an uncleavable N-term signal seq 

integral Likelihood = -9.92 Transmembrane 287 - 303 ( 278 - 309) 

INTEGRAL Likelihood = -9.08 Transmembrane 191 - 207 ( 186 - 210) 

INTEGRAL Likelihood = -8.44 Transmembrane 104 - 120 ( 101 - 123) 

45 INTEGRAL Likelihood = -8.23 Transmembrane 11 - 27 ( 9 - 41) 
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INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = -3.93 
Likelihood = -3.19 
Likelihood » -2.97 
Likelihood = -1.54 
Likelihood = -0.90 
Likelihood = -0.85 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 150) 

- 364) 

- 174) 

- 262) 

- 388) 



Final Results 

bacterial membrane — Certainty=0 .4970 (Affirmative) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF30659 GB:AE002122 preprotein translocase [Ureaplasma urealyticum] 
15 Identities = 105/422 (24%) , Positives = 213/422 (49%) , Gaps = 49/422 (11%) 



Query: 2 KLLYIFEKNIILRKILITFSLIIIFLLGRWPIPGVLISAYKGQDNNFATLYSTVTGGNL 61 

+LL IF+ +L +++T S++I+F +G +P+P + ++ G +F ++ + + GG L 
Sbjct: 13 QLLMIFKNKKITLVALIVTLSILILFRIGSVIPMPYIKLNGNFGNQGSFFSIINLLGGGGL 72 

Query: 62 SQVGVFSLGIGPMMTTMILLRLFT IGKYSSGVSQKVQQFRQNWMLVIAII 112 

SQ +F++GIGP +T I+++L + + K +K++ + ++ L +A++ 

Sbjct: 73 SQFSLFAIGIGPYITAQIIMQLLSSELVPPLAKLSKSGERGRKKIEVITR-IITLPLAVM 131 

Query: 113 QGLAITISFQYHNGFSL TKLLLATMI - - LVTGAYIISWIGNLNAEYGFG- 159 

Q + I NGF + L T I +V G YI ++ +L ++ G G 

Sbjct: 132 QAVIIINLMTRANGFISIVSNAPFAIGSPLFYVTYIFLMVGGTYISLFLADLISKKGVGN 191 

Query: 160 GMTILWVGMLVGQFHNI PLIFELF QDGYQLAIILFLLWTLVAMYLMITFERSE 213 

G+T+L++ G++ FN+ IF + + IL++L+ ++ + ++ S 

Sbjct: 192 GITLLILTGIVASLFNHFIAIFSNLGSLTSSKVSQIIGFILYILFYIMILIGWFVNNST 251 



Query: 214 YRIPVMRTS IHNRLVDEffiYMPIKVNASGGMAFMyVYTLLMFPQYI I ILLRSIFPT 268 

+ IPV +T H +L. ++PIK+ +G M ++ ++L P + L 

35 Sbjct: 252 RKIPVQQTGQALILDHEKL— PFLPIKIMTAGVMPVIFASSVLAIPAQVAEFLDK---Q 305 



Query: 269 NPDITSYNDYFSLSSIQGWIYMILMLVLSVAFTFVNIDPTKISEAMRESGDFIPNYRPG 328 

+ ++YF + S G+ IY++L+L+ + F++V ++P K++E ++++G FIP + G 

Sbjct: 306 SMGYYVIHNYFIVDSVreGLAIYVVLILLFTFFFSYVQIjIIPPKMAEDIKKAGRFIPGVQVG 365 

40 

Query: 329 KETQSYLSKICYLFGTFSGFFMAFLGGVPLLFALGNDDLR TVSSMTGIFMM 379 

+T+ +++K+ Y +AFD +P L AL + T+ T I +M 

Sbjct: 366 raDTEKHITKVIYRVNWIGAPILAFLACLPHLVALVAKTINHGIPVIQPSTIFGGTSIIIM 425 

45 Query: 380 IT 3B1 

+T 

Sbjct: 426 VT 427 



There is also homology to SEQ ID 3988. 

50 A related GBS gene <SEQ ID 8783> and protein <SEQ ID 8784> were also identified. Analysis of this 
protein sequence reveals the following: 

■ Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 6.32 
GvH: Signal Score (-7.5): -4.07 
55 Possible site: 59 



60 



>» Seems to 


have an uncleavable N 


term signal seq 










ALOM program 


count: 10 value: -9.92 threshold: 


0.0 








INTEGRAL 


Likelihood = -9.92 


Transmembrane 


287 


303 


278 


309) 


INTEGRAL 


Likelihood = -9.08 


Transmembrane 


191 


207 


186 


210) 


INTEGRAL 


Likelihood = -8.44 


Transmembrane 


104 




101 


123) 


INTEGRAL 


Likelihood = -8.23 


Transmembrane 


11 


27 


9 


41) 


INTEGRAL 


Likelihood = -3.93 


Transmembrane 


133 


149 


129 


150) 


INTEGRAL 


Likelihood - -3.19 


Transmembrane 


347 




344 




INTEGRAL 


Likelihood = -2.97 


Transmembrane 


158 




155 


174) 
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INTEGRAL Likelihood = -1.54 Transmembrane 246 - 262 ( 245 - 262) 

INTEGRAL Likelihood = -0.90 Transmembrane 372 - 388 ( 372 - 388) 

INTEGRAL Likelihood = -0.85 Transmembrane 64 - 80 ( 64 - 81) 
PERIPHERAL Likelihood =8.65 28 
modified ALOM score: 2.48 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 .4970 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF02350(316 - 1500 of 1827) 

EGAD] 6621 | 6420 (8 - 426 of 431) preprotein translocase secy subunit {Bacillus sp.} 
SP|P38375|SECY_BACHD PREPROTEIN TRANSLOCASE SECY SUBUNIT. GP 1 484251 1 dbj | BAA01191 . 1 1 | D10360 
secretion protein Y {Bacillus sp.} PIR |B44859 J B44859 preprotein translocase secY - Bacillus 

sp. 

%Match =12.1 

%Identity =26.8 %Similarity =55.4 

Matches = 109 Mismatches = 165 Conservative Sub.s = 116 

57 87 117 147 177 207 237 267 

EVWNWDRCITEGKTIYGIRRARKI)NQYISFERTMDDFEYLCDTIKQ1TO*SRRVMW*ILKSIFLILKLTKLTI*SYLS* 

297 327 357 387 441 471 501 

REQIDREREIPLKLLYIFEKNIILRKILITFSLIIIFLLGRYVPIPGV- -LISAYKGQDNNFATLYSTVTGGNLSQVGVF 
II •• II.-: I: :':|.- I =1 "Ml « : I I I I =1 II I :| 
MFRTISNIFRVGDLRRKVIFTLLMLIVFRIGSFIPVPGTNREVLDFVDQANAFGFL-NTFGGGALGNFSIF 



CLGIGPMMTTMILLRLF TIGKYSSGVSQ KVQQFRQNWMLVIAIIQGIAITISFQ-YHNGF SLTKLL 

::|| I :| h = = U = l = = = 1= II = = = h II I "= I = h l = = I 

AMGIMPYITASIVMQLLQMI^WPKFAEWAKEGEAGRRKLAQFTRYGTIVVLGFIQALGMSVGFNNFFPGLIPNPSVSVYIi 



LATMILVTGAYIISWIGNLNAEYGFG-GMTILWVGMLVGQFNNIPLIFEL-FQD-GYQL AIILFLLWTLVAMYLM 

: ::| I « M I I l = M = = h I h 11= II I II = 11 = 1 = = l = = 

FIALVLTAGTAFLIWiGEQITAKGVGNGISIIIFAGIAAGIPNGI^IYSTRIQDAGEQLFLNIWILLLALAILAIIVG 
160 170 180 190 200 210 220 230 

966 1023 1053 1083 1113 1143 

ITFERSEYR-IPVM RTSIHNRLVDDA-YMPIKVNASGGMAFl^WVYTLLMFPQYIIILLRSIFPTNPDITSYNDYFSL 

= I = I III I I = = = :« =IMI = 1=11 « II 

VIFVQQALRKI PVQYAKRLVGRNPVGGQSTHLPLKVNAAGVI PVI FALSLLIFPPTVAGLFGSDHPVAAWVIETFDY- - - 
240 250 260 270 280 290 300 

1173 1203 1233 1263 1293 1323 1353 1383 

SSIOGWIYMILMLVLSVAFTFVNIDPTKISEAt'IRESGDFIPNYRPGKETQSYLSKICYLFGTFSGFFMAFLGGVPLLEA 
: : |: :| : :: := : |: ::| :::| ::: | :|| |||| ||:|:: | | : :|:| : :|::| 

THLIG^VYALRIIGFTYFYAFIQVNPERMAEItt.KKQGGYIPGIRPGKATQTYITPILYRLTFVGSLFLAWAILPVFF- 
320 330 340 350 360 370 380 

1413 1440 1470 1500 1530 1560 1590 1620 

LGNDDLRTVSSMTGI - FMMITGMSFMILDEFQVIRIRKQYTSVFENEEN* CFILFHLGIMKIVLGMI I ITCGISSRLMSV 
: || : | :::: |::: : : : |:: | 

IKFADLPQAIQIGGTGLLIWGVALDTMKQIEAQLIKRSYKGFIK 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful 
vaccines or diagnostics. 
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Example 1299 

A DNA sequence (GBSxl377) was identified in S.agalactiae <SEQ ID 3989> which encodes the amino 
acid sequence <SEQ ID 3990>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3002 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT 



Query: 276 ALTVTLTDDIWELEHLLQRCPNTDFHIAAPVYCSDRLKQLVGYPNYYLHEA.ITEEQFEVL 335 

AL +T +D + ++E LL + PN FHI A S L L YPN L+ I + L 
Sbjct: 289 ALILTNSDQLEQIEQLLTQLPNVHFHIGAITEMSGHLMGLNRYPNVSLYPNIRPAKVAEL 348 

Query: 336 LLNSDIYLDINHGEEVWN 353 

D+YLDIN +E+ N 
Sbjct: 349 FERCDLYLDINISDEILN 366 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1300 

A DNA sequence (GBSxl378) was identified in S.agalactiae <SEQ ID 3991> which encodes the amino 
acid sequence <SEQ ID 3992>. This protein is predicted to be eps7. Analysis of this protein sequence 
reveals the following: 



■ Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) ■ 

bacterial outside Certainty=0 . 0000 (Not Clear) ■ 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) ■ 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAC07458 GB:AX009404 product = eps7 [Streptococcus thermophilus] 
Identities = 87/232 (37%) , Positives = 133/232 (56%) , Gaps = 22/232 (9%) 

Query: 10 VSVIIPVYNAAPYLEGCVNTILGQTYQVFEILLIDDGSTDTSASICDQLSLRDNRIRVFH 69 

+S++IPVYN Y++ C+++IL QT+ EI+L+DDGSTD S ICD S D RI+V H 
Sbjct: 3 ISIVIPVYNVQDYIKKCLDSILSQTFSDLEIILVDDGSTDLSC-RICDYYSENDKRIKVIH 62 

Query: 70 IENGGASKARNFGI^ISPESQFvTEVDSDDWvTCENYL 129 

NGG S4ARN G+ + S+++TF+DSDD+V +Y+E h + +NADI I+++ 
Sbjct: 63 TANGGQSEARNVGIKNAT--SEWITFIDSDDYVSSDYIEYLYNLIQVHNADISIASF--- 117 

Query: 130 RETEDIFGYYITDKDFV IEEISAQTAIDRQVHWHLNSSVFIVIWGKLYRRELFD 183 

YIT K + + + A+TAI R + LN + +WGK+YR E F+ 
Sbjct: 118 TYITPKKIIKHGNGEVALMDAKTAIRRML LNEGFDMGWGKMYRTEYFN 166 



55 



Query: 184 TITFPIDKVFEDELVSVLLFI KSKKTILVNGSYYGYRIRPNS IMTSAFSSKR 235 
F K+FED L++ 4F ++ + Y Y R NS + F+ K+ 
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Sbjct: 167 KYKFVS6KLFEDSLITYQIFSEASTIVFGAKDIYFYVHRKNSTVNGTFNIKK 218 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1301 

A DNA sequence (GBSxl379) was identified in S.agalactiae <SEQ ID 3993> which encodes the amino 
acid sequence <SEQ ID 3994>. Analysis of this protein sequence reveals the following: 

•> N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1569 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1302 

A DNA sequence (GBSxl380) was identified in S.agalactiae <SEQ ID 3995> which encodes the amino 
acid sequence <SEQ ID 3996>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0 . 1662 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1303 

A DNA sequence (GBSxl381) was identified in S.agalactiae <SEQ ID 3997> which encodes the amino 
acid sequence <SEQ ID 3998>. This protein is predicted to be a glycosyl transferase (gspA). Analysis of 
this protein sequence reveals the following: 

-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 2606 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



WO 02/34771 PCT/GB01/04789 
-1438- 

>GP:AAF28363 GB:AF224467 putative glycosyl transferase [Haemophilus 
ducreyi] 

Identities = 62/177 (35%) , Positives = 105/177 (59%) , Gaps = 8/177 (4%) 

Query: 3 YARYYIPQLIDAEKVLYLDIDTLWDNLDKLFE1ELGDYPIAAILD--GDGIY FN 55 

+ RY+I 1+ +KV+YLD D +V +L +L++ ++ +Y +AA+ D + IY FN 
Sbjct: 89 FFRYFISDFIEQDCTIYIiDMJIVWGSLTELYQTDISNYFLAAVKDIISEKIYvNNHIFN 148 

Query: 56 SGVMLINSLYWMRYRVTEKLLEITERELDNGIFGDQGVLNLLFDNTMLKLEDKYNAQVGN 115 

+G++LIN+ W + +T+ L ++E+ +++ DQ +OT.4-F + WLKL YN +G 
Sbjct: 149 AGMLLINNKKWREHNITQFCLSr.SEKYINSr.PDflDQSILNLIFKDKra,KLlTOGYNYLIGT 208 

Query: 116 DLGAFYENWQGYFDRNFES-PTIIHYCTHDKPKNTFSSSRFRETWWQYEQLDWNEVF 171 

D F Y + E+ P IIHY T KPW ++RFR +W Y +L+W +++ 

Sbjct: 209 DYLFFKYGKTRYLEDLGETI PLI IHYNTEAKPKLNI FNTRFRNIYWFYYEIJnIWQDIY 265 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1304 

A DNA sequence (GBSxl384) was identified in S.agalactiae <SEQ ID 3999> which encodes the amino 
acid sequence <SEQ ID 4000>. This protein is predicted to be a glycosyl transferase. Analysis of this 
protein sequence reveals the following: 

D N- terminal signal sequence 

— , — Final Results 

bacterial cytoplasm Certainty=0. 1157 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty-0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF28363 GB:AF224467 putative glycosyl transferase [Haemophilus 
ducreyi] 

Identities = 103/259 (39%) , Positives = 156/259 (59%) , Gaps = 3/259 (1%) 



T+FRYFI + I +DKV+YLD D+++ 



WRE I+Q L + + +L DQ +LN 



D WL+L+ YNY G D Ii+ + ++ + + +P +IHY T KPW 



+R+I+W Y L W+DI+ A 



Query: 


7 


Sbjct: 


10 


Query: 


67 


Sbjct: 


70 


Query: 


126 


Sbjct: 


130 




186 


Sbjct: 


190 


Query: 


245 


Sbjct: 


249 



No corresponding DNA sequence was identified in S.pyogenes. 
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-1439- 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1305 

A DNA sequence (GBSxl385) was identified in S.agalactiae <SEQ ID 4001> which encodes the amino 
acid sequence <SEQ ID 4002>. This protein is predicted to be a glycosyl transferase. Analysis of this 
protein sequence reveals the following: 

3 N-terminal signal sequence 



10 Final Results 

bacterial cytoplasm Certainty=0 . 2679 (Affirmative) < succ 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=D . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF28363 GB:AF224467 putative glycosyl transferase [Haemophilus 
ducreyi] 

Identities = 94/263 (35%) , Positives = 158/263 (59%) , Gaps = 4/263 (1%) 



Query: 




Sbjct: 


7 




62 


Sbjct: 


67 






Sbjct: 


127 




180 


Sbjct: 


185 


Query: 


240 


Sbjct: 


245 



FPR+PI +E++ V+YI1D+D++V GSL L+ ++ 



FN+G++LINN W++ I 



LN++ +++W+ t + !H IGD YG+ + P+I+HYNt-t- KPW 



+R+R+ +W+Y+ L W IYA+ 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1306 

A DNA sequence (GBSxl386) was identified in S.agalactiae <SEQ ID 4003> which encodes the amino 
acid sequence <SEQ ID 4004>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 2996 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10003> which encodes amino acid sequence <SEQ ID 
10004> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC75095 GB:AE000294 putative Galf transferase [Escherichia coli K12] 
5 Identities - 68/286 (23%) , Positives = 122/286 (41%) , Gaps = 18/286 (6%) 





77 


STRMDGI IAGLGRGDI WFQVPTWNSTEFDELFLDKU2AYGARI ITFVHDIVPLMFESNF 


136 






S ++ + GL D+++F P F +L + RI+ +HDI L 




Sbjct: 


50 


SVKLSTFLCGLENKDVLIFNFPMAKPFWHILSFFHRLLKF--RIVPLIHDIDELRGGGGS 


1C7 




137 


YLLDRVIDMYKRSDWILPTKAMHDYLIEKGMTTSKVLYQEVWDHPVNIDLPRPEC- - -Q 


193 






D V D+VI M YL K M+ K+ +++D+ V+ D+ + Q 




Sbj Ct: 


108 


- - -DSV- -RLATCDMVISHNPQMTKYL-SKYI1SQDKIKDIKIFDYLVSSDVEHRDA7TDKQ 


161 




194 


KVLSFAGDIQRFPFV2TOWKENIPLIYYGDGSRI^SEaNVHAQGWKDDVELMLSLSKRG--G 


252 






+ + +AG++ R + E +G ++ N G D + ++ G 




Sbjct: 




RGVIYAGNLSRHKCSFIYTEGCDFTLFG--VNYENKDNPKYLG-SFDAQSPEKINLPGMQ 


218 



Query: 253 FGLCWSEDREELVERR YSRMNAS YKLSTFLAAGLP I IANHD I SSRDFI KQHGLGFTV 309 

FGL WDE Y + N+KS +L4- LP+ + DFI + +G+ V 

Sbjct: 219 FGLIWDGDSVETCSGAFGDYLKFNNPHKTSLYLSMELPVFIWDKAALADFIVXINRIGYAV 278 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1307 

A DNA sequence (GBSxl387) was identified in S.agalactiae <SEQ ID 4005> which encodes the amino 
acid sequence <SEQ ID 4006>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3098 (Affirmative) < succ: 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

?GP:AAA73093 GB:M76233 [Rabbit smooth muscle myosin light chain 

kinase mRNA, complete CDS . ] , gene product [Oryctolagus 
cuniculus] 

Identities = 23/63 (36%) , Positives - 36/S3 (56%) 



Query: 5 QPAPALQRVRQCQPAPVLQPVPRCQPALALQRVRQCQPAQVLQQVPRCQPAQVLQQVPRC 64 

4 PA L+ V +PA L+PV +PA L+ V +PA+ L+ V +PA+ L+ V 
Sbjct: 225 KPAETLKPVGNAKPAETDKPVGNAKPAETLKPVGNAKPAETLKPVGNAKPAETLKAVANA 284 

Query: 65 QPA 67 
+PA 

Sbjct: 285 KPA 287 



55 No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1308 

A DNA sequence (GBSxl388) was identified in S. agalactiae <SEQ ID 4007> which encodes the amino 
5 acid sequence <SEQ ID 4008>. Analysis of this protein sequence reveals the following: 



Possible site: 43 
>>> Seems to have no N-ten 
INTEGRAL Likelihood 



•minal signal i 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 189 - 



Transmembrane 115 - 



Transmembrane 
Transmembrane 
Transmembrane 



135 - 151 
155 - 171 



235 - 251 
253 - 269 



- Final Results 

bacterial membrane --- Certainty=0 .4694 (Affirmative) ■ 

- Certainty=0. 0000 {Not Clear) < ; 

- Certainty=0. 0000 (Not Clear) < ; 



bacterial outside - 
bacterial cytoplasm - 



25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC16164 GB:AF010496 ice nucleation protein [Rhodobacter apsulatus] 
Identities - 85/285 (29%) , Positives = 119/286 (40%) , Gaps = 17/286 (5%) 

Query: 3 ALVIiADVDALVETLVLADVVALlEALVLADIEALV EALVLADIEALVEALVLADID 58 

30 ALAALT+ A++L AD+ L +ALAIAL++A 

Sbjct: 523 ALSDAQAGALTSTQIGLLSTAAVKGLSTADMAGLTTAEAQALTSAQIAALSSSQIRAMTT 582 

Query: 59 ALVEALVLAD I EAL VEALVL AD I DALVEALVLADVEAL I EALVLALVEALVLAD VE 114 

A + AL A 1+ L + +L ADI AL A + I AL +LV A+ AD+ 

35 Sbjct: 583 AQIAALGTAQIKGLTASNILGLETADIVALTTTQAPALSSSQIAALSTSLVAAMETADLA 642 

Query: 115 ALIEALVLAL VEALVLADVEAL IEALVLALVEALVLADVEAL I EALVLALVE 166 

LA +ALAA+ I + A ++ L AD+ AL A + + 

Sbjct: 643 KLSAATFKGFSSTQITALTTAQAGAIGTDQIAQITTAAIKGLESADIAALANATLAKMTT 702 

40 

Query: 167 ALVLADVFJUjlEALVLADVD-ALVLALVEALVI^^ 225 

AV A+L ++LA V+AL A + L ++ AL AL V 

Sbjct: 703 AQVAVLGSAQLTGLTTTQIimLTTAQVKALGAAALAGLGTDDIVALTTGQAAALSSTQV 762 

45 Query: 226 EALILALvTiMjVLADvDALMEALVIiADVE 271 

AL A + AL AD AL A + +AL +DAL A 

Sbjct: 763 AALSTAQISALQTADFAALSTAAIKGLSSTQITALSTGQIDALTTA 808 

No corresponding DNA sequence was identified in S.pyogenes. 
50 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1309 

A DNA sequence (GBSxl389) was identified in S.agalactiae <SEQ ID 4009> which encodes the amino 
acid sequence <SEQ ID 4010>. Analysis of this protein sequence reveals the following: 
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Final Results 

bacterial cytoplasm Certainty=0. 2297 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1310 

A DNA sequence (GBSxl390) was identified in S.agalactiae <SEQ ID 401 1> which encodes the amino 
acid sequence <SEQ ID 4012>. This protein is predicted to be fimbriae-associated protein Fapl. Analysis of 
this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 3138 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA97453 GB:AB029393 streptococcal hemagglutinin [Streptococcus gordonii] 
Identities = 388/968 (40%) , Positives = 518/968 (53%) , Gaps = 68/968 (7%) 

Query: 13 VDTKSR\raHKSEKNWWTVMSHFNLFKAIKGRATvI^VCIQDvEKEDRLSSGNLTYLK 72 

V+ +R K+ KS K+W+R S F L + +KG +V V +E + G L YLK 
Sbjct: 13 VERVTRFKLIKSGKHWLRAATSQFGLLRLMKGADISSVEV KVAEEQSVEKGGLNYLK 69 

Query: 73 GILAAGALVGGASLTSR-VYADETPWQEQSSSVPTLAEQTEVTV--KTTTVQNHQDGTV 129 

GI+A GA++GGA +TS VYA+E +++ + LA + E + + T+ + 
Sbjct: 70 GIIATGAVLGGAVVTSSSVYAEEEQALEKVIDTRDVLATRGEAVLSEEAATTLSSEGANP 129 

Query: 130 SKNIIDSNSVSMSESASTSTSESVSMSMSGSTLTSVSESVSTSALTSASESISTSASESV 189 

+++ D+ S S S SA+ S S S+S+S S S S S S S+S S+SES S S S SV 
Sbjct: 130 VESLSDTLSASESASAN-SVSTSISISESFSVSASASLSSSSSLSQSSSESASASESLSV 188 

Query: 190 SKSTSISEVSNILETQASLTDKGRESFSANQIVTESSLVTDAGKNASVSSIjIEITKPKSE 249 

S STS S S TQ+S + S S+N + T S V+ +NA V + + +E 
Sbjct: 189 SASTSQSFSSTTSSTQSS^I^IESLISSDS3NSLNT^IQS-VSARNQNARVRTRRAVAANDTE 247 

Query: 250 LQTSKMSNESLITPEKSQVMIASDKTGN3SLTPTIRLKSVIQPRSMNLMTLSSEMDLIPL 309 

K + + E + ++ T N + ++ N+ ++ LP 

Sbjct: 248 APQVKSGDYWYRGESFEYY- -AEITDNSGQVNRWIR NVEGGANSTYLSPN 297 

Query: 310 EEVSDTEMLGKDVSSELQKVNIALKDNTLSEPGTVKLDSSENLVLNFAFSIASVNEGDVF 369 

TE LG+ ++ +Q L+ E ++ + ++ + +A G+ 

Sbjct: 298 WVKYSTENLGRPGNATVQN PLRTRI FGEVPIiNEIVNEKSYYTRYI - -VAWDPSGN- - 350 

Query: 370 TVTO.SDNLDTCGIGTILKVQDIMDETGQLLATGSYSPLTHNITY TWTRYAST 421 

++ DN + G+ + +E Y P ++TY T R A 

Sbjct: 351 ATQMVDNANRNGLERFVLTVKSQNE KYDPAESSVTTVNNLSNLSTSEREAVA 402 

Query: 422 LNNIKARVNMPVWPDQRI 1 SKTTSDKQCFTATLNNQVAS IE ERVQYNSPS 471 

A N+P P +1 ++ T DK T N V ++ S S 

Sbjct: 403 AAVRAANPNIP--PTAKITVSQNGTVTITYPDKSTDTIPANRVVKDLQISKSNSASQSSS 460 

Query: 472 VTEHTNVKTNWSRIMKLDDERQTETYITQINPEGKEMYFASGLGNLYTIIGSDGTSGSP 531 
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V+ + T+V + I ++ + + ++ S+ S S 

Sbjct: 461 VSASQSASTSVSAS I SASMSASVSVSTSASTSASVSASESASTSASVSASESASTS- 516 

Query: 532 VNLLNAEWILKTNSKNLTDSMDQ^DSPEFEDVTSQYSYTMDGSKITIDWKTNSISSTT 591 

A V K++S + + S+4+ + S+S + S+S++T 

Sbjct: 517 ASVSASKSSSTSASVSASESASTSASVSASESASTSASVSASESASTSASVSAST 571 

Query: 592 SYWLVKIPKQSGVLYSTVSDINQTYGSKYSyGHTNISGDSElANAEIKL-LSESASTSAS 650 

S + ST + ++ + + S ++S A+ + SESASTSAS 
Sbjct: 572 SASTSASVSASESA- - STSASVSASESASTS ASVSASESASTSASVSASESASTSAS 626 

Query: 651 TSASTSASMSASTSAST3ASMSASTSASTSASTSASMSASTSASTSASTSASTSASTSAS 710 

SAS S+S SAS SAS SAS SAS SAS SASTSAS+SASTSASTSAS SASTSASTSAS 
Sbjct: 627 VSASESSSTSASVSASESASTSASVSASESASTSASVSASTSASTSASVSASTSASTSAS 686 

Query: 711 MSASTSASTSASTSASTSASTSASTSASMSASTSASTSASTSASTSASMSASTSASTSAS 770 

+ SASTSASTSAS SAS SASTSAS SAS SASTSAS SASTSASTSAS+SASTSASTSAS 
Sbjct: 687 VSASTSASTSASVSASESASTSASVSASESASTSASVSASTSASTSASVSASTSASTSAS 746 

Query: 771 TSASTSASMSASTSASTSASTSASTSASMSASTSASTSASTSASTSASMSASTSASTSAS 830 
SAS SAS SAS SASTSASTSAS SAS SASTSAS SAST ASTSAS+SAS SASTSAS 

Sbjct: 747 \ 



Query: 831 1 

SAS SASTSAS SAS SASTSAS SAS SASTSAS SAS SASTSAS SAS SASTSAS 
Sbjct: 807 VSASESASTSASVSASTSASTSASVSASESASTSASVSASESASTSASVSASESASTSAS 866 

Query: 891 MSATTSASTSVSTSASTSASTSASTSSSSSVTSNSSKEKVYSALPSTGDQDYSVTATALG 950 

+SA+TSASTS S SAS SASTSAS S+S S ++++S SA S +T+ 

Sbjct: 867 VSASTSASTSASVSASESASTSASVSASESASTSASVSASESASTSASVSASESASTSAS 926 

Query: 951 LGLMTGAT 958 
+ T A+ 
, Sbjct: 927 VSASTSAS 934 

There is also homology to SEQ ID 760. 

SEQ ID 4012 (GBS68) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysisof total cell 
extract is shown in Figure 33 (lane 4; MW 131 ,2kDa). 

GBS68d was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 153 (lane 14; MW 103kDa) and in Figure 239 (lane 13; MW 103kDa). It was also 
expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 152 
(lane 17; MW 78kDa), in Figure 153 (lane 17; MW >78kDa) and in Figure 184 (lane 10; MW 78kDa). 
Purified GBS68d-GST is shown in Figure 246, lane 5. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1311 

A DNA sequence (GBSxl391) was identified in S.agcdactiae <SEQ ID 4013> which encodes the amino 
acid sequence <SEQ ID 4014>. This protein is predicted to be RofA. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 1738 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=O.OOOOiNot Clear) < suco 



WO 02/34771 



-1444- 



PCT/GB01/04789 



A related GBS nucleic acid sequence <SEQ ID 10005> which encodes amino acid sequence <SEQ ID 
10006> was also identified. 

There is also homology to SEQ ID 3750. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1312 

A DNA sequence (GBSxl392) was identified in S.agalactiae <SEQ ID 4015> which encodes the amino 
acid sequence <SEQ ID 4016>. This protein is predicted to be Nra. Analysis of this protein sequence 
reveals the following: 

Possible site: IS 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

There is also homology to SEQ ID 3750. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1313 

A DNA sequence (GBSxl393) was identified in S.agalactiae <SEQ ID 401 7> which encodes the amino 
acid sequence <SEQ ID 401 8>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3674 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA27020 GB:M80215 uvs402 protein [Streptococcus pneumoniae] 
Identities = 577/663 (87%) , Positives = 633/663 (95%) , Gaps = 1/663 (0%) 

Query: 1 MIDRKDTNRFKLVSKYSPSGDQPQAIETLTONISGGEKAQILKGATGTGICrYTMSQVIAQ 60 

MI+ N+FKLVSKY PSGDQPQAIE LVDNIEGGEKAQIL GATGTGKTYTMSQVI++ 
Sbjct: 7 MINH1TDNQFKLVSKYQPSGDQPQAIEQLVDNIEGGEKAQILMGATGTGKTYTMSQVISK 66 

Query: 61 VNKPTLVIAHNKTIAGQLYGEFKEFFPDNAVEYFVSYYDYYQPEAYVPSSDTYIEKDSSV 120 

VNKPTLVIAHNKTLAGQLYGEFKEFFP+NAVEYFVSYYDYYQPEAYVPSSDTYIEKDSSV 
Sbjct: 67 VNKPTLVIAHNKILAGQLYGEFKEFFPENAVEYFVSYYDYYQPEAYVPSSDTYIEKDSSV 126 



Query: 121 K 

NDEIDKLRHSATS+LLERNDVIWASVSCIYGLG8PKEYADSWSLRPG EISRD+LI1N+ 
Sbjct: 127 NDEIDKLRHSATSALLERNDVIVvASVSCIYGLGSPKEYADSWSLRPGLEISRDKLLND 186 

Query: 181 LVDIQFERNDIDFQRGKFRTOGDvVEVFPASRDEHAFRIEFFGDEIDRIREIESLTGRVL 240 

LVDIQFERNDIDFQRG+FRVRGDWE+FPASRD3HAFR+EFFGDEIDRIRE+E+LTG+VL 
Sbjct: 187 LVDIQFERNDIDFQRGRFRVRGDWEIFPASRDEHAFRVEFFGDEIDRIREVEALTGQVL 246 
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Query: 




Sbjct: 


247 




301 


Sbjct: 


307 


Query: 


361 


Sb j ct : 


367 




421 


Sbjct: 


427 




481 


Sb j ct : 


487 


Query: 


541 








601 


Sbjct: 


607 




661 


Sb j ct : 


666 



MLREMGYTNGVENYSRHMD3RSEGEPP+TLLDFFP4DFLIMIDESHMTMGQIKGMYNGDR 



SRK+MLVNYGFRLPSALDNRPLRREEFESHVHCI VYVSATPGDYE EQT+TV+EQI IRPT 



GLLDPEVEVRP+MGQ+DDLLGEIN R EK ERTFITTLTK+MAEDLTDY KEMG+KVKYM 



QTIGRAARNS GHVIMYAD +T SMQRA+DETARRR++QM YNE+HGI VPQTIKKEIRDL 



++ +K VD +SL4K+ER+ +K L++QMQEA E+LDFELAAQIRD++LE+K 



A related DNA sequence was identified in S. pyogenes <SEQ ID 4019> which encodes the amino acid 
sequence <SEQ ID 402O. Analysis of this protein sequence reveals the following: 



Pinal Results 

bacterial cytoplasm Certainty=0. 4 3 86 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 570/663 (85%) , Positives = 625/663 (93%) 



NDEIDKLRHSATSSLLERNEVIWASVSCIYGLGSPKEyADS VSLRPGQEISRD LEN 
NDEIDKLRHSATSSLLERNDVIWASVSCIYGLGSPKEYADSAVSLRPGQEISRDTLLNQ 180 



LVDIQFERNDIDFQRG FRVRGDWEVFPASRDEHAFR+EFFGDEIDRI EIESLTG+ H 



GEVEHIAIFPATHFMTNDEHMEEAlSKICiffirffiNQVELFEKEGKLIEAQRIRQRTEYDIE 300 
GEV+HL +FPATHF+TNDEHME++I+KIQAE+ Q++LFE EGKL+EAQR+RQRTEYDIE 

tTHFVTNDEHMEQSIAKIQAELAEQLQLFESEGKLLEAQRLRQRTEYDIE 300 





1 


Sbjct: 


1 




61 


Sbjct: 






121 


Sbjct: 


121 




181 


Sbjct: 


181 


Query: 




Sbjct: 
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Query: 301 

Sbjct: 301 

Query: 361 

Sbjct: 361 

Query: 421 

Sbjct: 421 

Query: 481 

Sbjct: 481 
Query: 



MLREMGYTNGVENYSRHMDGRSEGEPPFTLLDFFPEDFLIMIDESHMTMGQIKGMYNGDR 360 
MLREMGYT+GVENYSRHMDGRS GEPPtTLLDFFPEDFLIMIDESHMTMGQIKGMYNGD+ 
MLREMGYTSGVENYSRHMDGRSPGEPPYTLLDFFPEDFLIMIDESHMTMGQIKGMYNGDQ 360 



Sbjc 



SbjCt: 



GLLDPEVF^PSMGQMDDLLGEINLRTEKGERTFITTLTKRI'IASDLTDYLKEMGVKVKifM 480 
GLLDPE++VR SMGQMDDLLGEIN R + ERTFITTLTK+MAEDLTDYLKEMGVKVKYM 
GLLDPEIDVRSSMGQMDDLLGEINQRVARDERTFITTLTKKMAEDLTDYLKEMGVKVKYM 480 

HSDIKTLERTEIIRDLRLGVFDVLIGINLLREGIEVPEVSLVAILDADKEGFLRNERGLI 54 0 
HSDIKTLERTEIIRDLRLGVFDVLIGINLLREGIDVPEVSLVAILDADKEGFLRNERGLI 
HSDIKTLERTEIIRDLRLGVFDVLIGINLLREG1DVPEVSLVAILDADKEGFLRNERGLI 540 

QTIGRAARNSNGHVIMYADKITDSMQRAMDSTARRRRLQMDYNEKHGIVPQTIKKEIRDL 600 
QTIGRAARN +GHVIMYADK+TDSMQRA+DETARRR +Q+ YN+ HGIVPQTIKK+IR L 
QTIGRAARNVDGHVIMYADKMTDSMQRAIDETARRREIQIAYNKAHGIVPQTIKKDIRGL 600 

L IAITKSOTDSDKPEKOTDYSSLSKKERQAEIKRLQQQMQEAAELLDFELAAQIRDVILELK 660 
I+I+K++ +D ++ +DY S+S+ ER+ I ALQ+QMQEAAELLDFEIAAQ+RD+ILELK 

DLILELK 660 



661 AID 663 



541 



661 LMD 663 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 1314 

A DNA sequence (GBSxl394) was identified in S.agalactiae <SEQ ID 4021> which 
acid sequence <SEQ ID 4022>. Analysis of this protein sequence reveals the following: 



INTEGRAL 



N-terminal signal sequence 



Likelihood =-11.78 
Likelihood =-10.08 
Likelihood = -5.52 
INTEGRAL Likelihood = -5.15 
INTEGRAL Likelihood = -3.2 9 
INTEGRAL Likelihood = -1.54 
INTEGRAL Likelihood = -0.48 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



203 - 219 
1B3 - 199 



Transmembrane 



201 - 225! 
182 - 200: 



•- Certainty=0. 5713 (Affirmative) < suco 
•- Certainty=0 . 0000 (Not Clear) < suco 
•- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:CAA22372 GB:AL034446 putative transmembrane protein 

[Streptomyces coelicolor A3 (2) ] 
Identities = 58/190 (30%), Positives = 96/190 (50%), Gaps = 11/190 (5%) 

Query: 114 GWS--IGFILFSISVITAYILGGLDFHSYDVSK-ATIFYWTLLPFWLIQSGTEELLTRG 170 
55 GW IGF LF +VIT G Y+V ++ + L+ F + TEE++ RG 

Sbjct: 98 GWGTLIGFGLFG-AVITNLFASGY YEvDGLGSVQGAIGLVGFMAAAAATEEWFRG 152 

Query: 171 WLLPLINHRFHLAVaiGVSSTLFGILHLVNAHVTFLSIVSI-ICSGVLMSLYMIKSGNIW 229 
L +1 +A+G++ +FG++HL+N T ++I I +G +++ + N+W 

60 Sbjct: 153 VLFRIIEEHIGTYLTALGLTGLVFGLMHLLNEDATLWGAIAIAIEAGFMLAAAYAaTRNLW 212 

Query: 230 SVAALHGAVMFSC^NLYGIAVSGQKAGASLIjHFTVKENAPDWISGGAFGIEGSLISIFVL 289 
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Query: 290 LAAIIYLLWL 299 

+ + LWL 
Sbjct: 271 VLLTLVFLWL 280 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1315 

A DNA sequence (GBSxl395) was identified in S.agalactiae <SEQ ID 4023> which encodes the amino 
acid sequence <SEQ ID 4024>. This protein is predicted to be glutamine-binding periplasmic 
protein/glutamine transport system perme. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems 'to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.97 Transmembrane 532 - 548 ( 523 - 553) 

INTEGRAL Likelihood = -7.38 Transmembrane 700 - 716 ( 696 - 720) 

INTEGRAL Likelihood = -4.57 Transmembrane 562 - 578 ( 558 - 588) 

INTEGRAL Likelihood = -0.32 Transmembrane 665 - 681 ( 665 - 681) 

Final Results 

bacterial membrane Certainty=0. 4588 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF16724 GB.-AF141644 putative integral membrane protein 
[Lactococcus lactis] 
Identities = 109/195 (55%) , Positives = 156/195 (79%) , Gaps = 4/195 (2%) 

Query: 466 KMFNNGLASLKKSGEYDKLVKKYLSTASTSSNDKAAKPVDESTILGLISNNYKQLLSGIG 525 

+MFNNGLA+L+ +GEYDK++ KYL++ T + +AK E+T G++ NN++Q+ G+ 
Sbjct: 1 EMFNNGLANLRANGEYDKIIDKYLAS-DTKTIQSSAK ENTFFGILQNNWEQIGRGLL 56 

Query: 526 TTLSLTLISFAIAMVIGIIFGMMSVSPSNTLRTISMIFVDIVRGIPLMIVAAFIFWGIPN 535 

TL L ++SF +AM++GIIFG+ SV+PS LRTI+ I+VD+ R IPL+++ FIF+GIPN 
Sbjct: 57 VTLELAVLSFILAMIVGIIFGLFSVAPSKILRTIARIYVDLNRSIPLLVLTIFIFYGIPN 116 

Query: 586 LIESITGHQSPINDFVAATIALSLNGGAYIAEIVRGGIEAVPSGQMEASRSLGISYGKTM 645 

L++ ITGHQSP+N+F A IAL+LN AYIAEIVR G++AVPSGQMEASRSLG++Y +M 
Sbjct: 117 LLQIITGHQSPLNEFTAGVIALTLNSSAYIAEIVRSGVQAVPSGQMEASRSLGVTYLTSM 176 

Query: 646 QKVILPQAVRLMLPN 660 

+KVILPQA+++ +P+ 
Sbjct: 177 RKVILPQAIKITIPS 191 

There is also homology to SEQ ID 1 198. 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 9071> which encodes amino acid 
sequence <SEQ ID 9072>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) . 

An alignment of the GAS and GBS sequences follows: 



Query: 34 IKKTRKLVVAVSPDYAPFEFKALVNGKDTIVGADVQLAQAIADEIiDVDLELSPMSFDNVL 93 

+K + K+V S +APFE++ NGK G D++L + IA + L++S FD h 
Sbjct: 268 VKPSYKIVSDSS - - FAPFEYQ NGKGKYTGFDMELI KKIAKQQGFKLDI SNPGFDAAL 322 

Query: 94 SSLQTGKADIAISGISHTKERAKVYDFSIPYYQAENAIVMRASDAKVTKNISDLNGKKVA 153 

+++Q+G+AD I+G + T+ R K++DFS PYY +++++ K+ DL GK V 

Sbjct: 323 NAVQSGQADGVIAGATITEARQKIFDFSDPYY--TSSVILAVKKGSNVKSYQDLKGKTVG 380 

Query: 154 AQKGSIEEGLVKIQLPKANLISLTAMGEA- - - INELKAGQVYAVTLEAPVAAGFLAQHKD 210 

A+G+ + K N +AEA 4- + +G + A+ + VA+Q + 

Sbjct: 381 AKNGTASYTWLSDHADKYN-YHVKAFDEASTMYDSMNSGSIDAIiMDDEAVIiAYAINQGRK 439 

Query: 211 LALAPFSLKTSDGDAKAVALPKNSGDLTKAVNKVIAKLDEQERYKSFIAETIA 263 

P +SGD ++LKN+AL+ Y + + ++ 

Sbjct: 440 FE-TPIKGEKS-GDIGFAVKKGANPELIKMFNWSLASLKKSGEYDKLVKKYLS 490 
Score = 74.5 bits (180), Expect = le-15 

Identities = 59/215 (27%) , Positives = 102/215 (47%) , Gaps = 12/215 (5%) 

Query: 48 YAPFEFKALVNGKDT I VGADVQLAQAI ADELDVDLELS PMS FDNVLSSLQTGKADLAI SG 107 

YAPFEFK + T G DV + +A ++ ++ FD ++++Q+G+AD ++G 

Sbjct: 36 YAPFEFK- --DSDQTYKGIDVDIVTSIEVAKRAGWNVNMTYPGFDAAVNAVQSGQADALMAG 92 

Query: 108 ISHTKEPAKVYDFSIPYYQAENAXVMRASDAKVTKNISDLNGKKVAAQKGSIEEGLVKIQ 167 

+ T+ R KV++FS YY +1+ ++ KVT N Ii SK V + G+ + ++ 
Sbjct: 93 TTVTEARKKVFWFSDTYYDT-SVILYTKl^KOT-iraCQLKGKWGVKNGTAAQSFLEEN-150 

Query: 168 LPKANLISLTAMGEAI - -NELKAGQWAVTLEAPVAAGFLAQHKDIALAPFSLKTSDGDA 225 

K T + N L +G +YA + PV + Q K A+ +++ + 
Sbjct: 151 KSKYGYKVKTFDTSDLMNNSLDSGSIYAAMDDQPWQFAINQGKAYAI WMEGEAVGS 207 

Query: 226 KAVALPKNSG- -DLTKAWKVIAKLDEQERYKSFI 258 

A A+ K SG +L K N A++ Y + 

Sbjct: 208 FAFAVKKGSGHDNL I KEFNTAFAQMKSDGTYNDIM 242 

SEQ ID 4024 (GBS154) was expressed in E.coli as a His-fusion product. The purified protein is shown in 
Figure 199, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1316 

A DNA sequence (GBSxl396) was identified in S.agalactiae <SEQ ID 4025> which encodes the amino 
acid sequence <SEQ ID 4026>. This protein is predicted to be amino acid ABC transporter, ATP-binding 
protein (glnQ). Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty= 0. 4 1B3 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB90561 GB:AE001058 glutamine ABC transporter, ATP-binding 
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Query: 


5 


Sbjct: 


3 


Query: 


65 


Sbjct: 


63 




125 


Sbjct: 


123 




185 


Sbjct: 


183 



protein (glnQ) [Archaeoglobus fulgidus] 
Identities = 147/240 (61%) , Positives = 192/240 (79%) 

KIDVQDLHKSYGQNEVLKGIDAKFYEGDWCIIGPSGSGKSTFLRTLNLLESITSGKVW 64 
++++ DLHK 4G+ EVLKG+ K +G+W IIGPSGSGKST LR +N LE TSGK+++ 
QLEIIDIjHKRFGELEVLKGVTMKVEKGEWVIIGPSGSGKSTLLRCINRLEEPTSGKILL 62 



J +++N K DI+K R+ IG+VFQ FNLFPH++ L+N+T API++ K SK AE+ GM L 



LEKVGL DKA+ P LSGGQ+QRVAIAR+LAKNP+++LFDE TSALDPE+V +VL+VMK 



LA GMTM+ +VTHEMGFAR+V +RVIF DGG +E+G PEQIF P+H R + FL+ +L 
QLARDGMTMVVVTHEMGFAREVGDRVIFMDSGVIVEEGKPEQIFSNPKHERTRKFLSMIL 242 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4027> which encodes the amino acid 
sequence <SEQ ID 4028>. Analysis of this protein sequence reveals the following: 

) N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 .4149 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the databases: 

>GP:BAB05180 GB:AP001512 ABC transporter (substrate-binding protein) 
[Bacillus halodurans] 
Identities = 79/227 (34%), Positives = 126/227 (54%), Gaps = 10/227 (4%) 

35 Query: 35 KKTRKLWAVSPDYAPFEFKALVNGKDTIVGADVQLAQAIADELDVDLELSPMSFDNVLS 94 

+K LV+ S DY P+E + G+ IVG DV +A+ I EL +L++ M F+ ++ 
Sbjct: 48 EKKSVLVMGTSADYPPYESVDVTTGE--IVGFDVDIAEYITSELGYELKIQDMDFNGIIP 105 

Query: 95 SLQTGKADLAISGISHTKEPAKVYDFSIPYYQAENAIVMRASDAKVTKNISDLNGKKVAA 154 
40 +LQ G+ D A+SG++ T+ER K DFS YY A+N +V + D ++ DL GK V 

Sbjct: 106 ALQAGRVDFALSGMTPTEERKK.SVDFSDVYYDAQNLWFKEEDG--LSSVEDIjAGKTVGV 163 

Query: 155 QKGSI -EEGLVKIQ- -LPKANLISLTAMGEAINELICAGQVYAVTLEAPVAAGFLAQHKDL 211 
Q SI EE V++Q L + + + E + EL AG+V A+ +E VAAG L + 
45 Sbjct: 164 QLASIQEEAAVELQEELDGLTIETRNRVPELVQELLAGRVDALI IEDTVAAGHLEANP - - 221 

Query: 212 ALAPFSLKTSDGDAKAVALPKNSGDLTKAVNKVIAKLDEQERYKSFI 25B 

L F++++ A+A PK+S +LT+ N+ + ++ E +1 

Sbjct: 222 GLVRFAIESEGETGSAIAFPKDS-ELTEPFNEKLQEMMEDGTMEELI 267 

50 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 223/246 (90%) , Positives = 238/246 (96%) 

Query: 1 MAELKIDVQDLHKSYGQNEVLKGIDAKFYEGDWCIIGPSGSGKSTFLRTLNLLESITSG 60 
55 M ELKIDVQDLHKSYGQNE VLKGIDAKFYEGDWCI IGPSGSGKSTFLRTLNLLE4 ITSG 

Sbjct: 1 MTELKIDVQDLHKSYGQNEVLKGIDAKFYEGDWCIIGPSGSGKSTFLRTLNLLETITSG 60 

Query: 61 KOTVDGFELSNPKTDIDKARENIGIWFQflFNLFPHMSVLENITFAPIELGKESKEAAEKH 120 
KV+VDGFELS+PKT+IDKAREMIGMVFQHFNLFPHM+VLENI FAP+ELGKESKE A+KH 
60 Sbjct: 61 KVMvDGFELSDPIONIDKARF^IGMVFQHFNLFPHMTVLF^IIFAPVELGKESKEVAKKH 120 

Query: 121 GMELLEKVGLADKANAKPDSLSGGQKQRVAIARSIjAMNPDILLFDEPTSALDPEMVGDVL 180 

GM LLEKVGL+DKA+A P SLSGGQRQRVAIARSIAMNPDI+LFDEPTSALDPEMVGDVL 
Sbjct: 121 GMALLEKVGLSDK^JDAFPGSLSGGQKQRVAIARSLAMNPDIMLFDEPTSALDPE^WGDVL 180 
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Query: 181 NVMKDLAEQGMTMLIVTHEMGFARQVANRVIFTDGGRF1.SD3TPEQIFDTPQHPRLQDFL 240 

JnTVMKDLAEQGMTMIjIVTHEMGFARQVANRVIFTDGG+FLEDGTPE+IFD P+HPRL +FL 
Sbjct: 181 l^KDLflEQGMTMLIWHEI^FARQVAM?VIFTDG(^FLEDGTPEEIKDHPiCHPRLIEFL 240 

Query: 241 NKVRW 246 

+KVLNV 
Sbjct: 241 DKVLNV 246 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1317 

A DNA sequence (GBSxl397) was identified in S.agalactiae <SEQ ID 4029> which encodes the amino 
acid sequence <SEQ ID 403 0>. Analysis of this protein sequence reveals the following: 

a N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2311 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 403 1> which encodes the amino acid 
sequence <SEQ ID 4032>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2702 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 45/51 (88%) , Positives = 49/51 (95%) 

Query: 1 MGDKPISFRDKDGNFVSAADVl'INAEKLEELFNTLNPNRKLRLEREKLAKEK 51 

MGDKPISF+DKDGNFVSAADVWNAEKLEELFN IiNPNR-t-LRLEREKL K++ 
Sbjct: 11 MGDKPISFKDKDGNFVSJAADVl'rNAEKLEELFNLLNPNRRLRLEREKLKKDE 51 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1318 

A DNA sequence (GBSxl398) was identified in S.agalactiae <SEQ ID 4033> which encodes the amino 
acid sequence <SEQ ID 4034>. This protein is predicted to be spoOb-associated GTP-binding protein (obg). 
Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 2967 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 







Ljct. 




























Query: 


243 


Sbjct: 


241 


Query: 


303 


Sb j ct : 


301 




363 


Sbjct: 


354 


Query: 


423 


Sbjct: 


414 



++FKA GE GM+K HGR A+D+++ +PPGT V D T +VI DL EH Q V+ARGG 



RGGRGN RFATP NPAP+++ENGEPG+ER + LELK+LADVGLVGFPSVGKSTLLSWS+ 



AKPKI YHFTT+VPNLGMV T G SF MADLPGLIEGA QGVGLG QFLRHIERTRVI 



+HVIDMS EGRDPYDDY++IN EL YNLRL ERPQI IVANKMDMP++ ENL AFKEKL 



+E PF ITRD D +VL GD LE+LF MT+ RDES+ +FARQ+RGMGVDEALRERGAKD 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4035> which encodes the amino acid 
sequence <SEQ ID 4036>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 2588 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities - 394/437 (90%), Positives = 421/437 (96%) 

Query: 1 MSMFLDTAKISVKAGRGGDGMVAFRR3KYVPNGGPWGGDGGKGGSVIFKVNEGLRTLMDF 60 

MSMFLDTAKISV+AGRGGDGMVAFRREKYVPNGGPWGGDGGKGGSVIF+V+EGLRTLMDF 
Sbjct: 1 MSMFLDTAKISVQAGRGGDGMVAFRREKYVPNGGPWGGDGGKGGSVIFRVDEGLRTLMDF 60 

Query: 61 RYNRNFKAKAGEKGMTKGMHGRGAEDLIVSLPPGTTVRDATTGKVITDLVEHDQEFWAR 120 

RYNR FKAK+GEKGMTKGMHGRGAEDIjIV +P GTTVRDA TGKVITDLVEH QE V+A+ 
Sbjct: 61 RYNRKFKAKSGEKGMTKG^GRGAEDLIVFVPQGTTVRDAETGKVITDLVEHGQEWIAK 120 

Query: 121 GGRGGRGNIRFATPRNPAPEIAENGEPGEERELQLELKILADVGLVGFPSVGKSTLLSW 180 

GGRGGRGNIRFATPRNPAPEIAENGEPGEER+L+LELKILADVGLVGFPSVGKSTLLSW 
Sbjct: 121 GGRGGRGNIRFATPRNPAPEIAEIJGEPGEERQLELELKILADVGLVGFPSVGKSTLLSW 180 



Query: 181 SAAKPKIGAYHFTTIVPNLGMVRTKSGDSFAMADLPGLIEGASQGVGLGTQFLRHIERTR 240 
S+AKPKIGAYHFTTIVPNLGMVRTKSGDSFAMADLPGLIEGASQGVGLGTQFLRHIERTR 
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Sbjct: 


181 SSAKPKIGAYHFTTIVPNLGMVRTKSGDSFAMADLPGLIEGASCGVGLGTQFLRHIERTR 


240 










VILHVTDMSASEGRDPY+DYVS INH3LETYNLRLMERPQI IVANKMD+P+++ENL AFK+ 




Sbjct: 


241 VILHVIDMSASEGRDPYEDWSINNELETTOLRLMERPQIIVANKMDIPEAQENLKAFKK 


300 


Query: 


301 KIiAANYDEFDDMPMIFPISSrJUIQGLENLMDATAErjIiAlTTEEFLLYDETDMQEDEAYyGF 


360 




KLAA YDEFDD+PMIFPISSLAHQGLENL++ATAELLA T+EFLLYDE+D+ ++EAYYGF 




Sb'ct 




360 


Query: 


361 NEDERPFEITRDDDATWVLYGDKLEKLFVNFTJJ3MERDESIMKFARQLRGIVIGVDEALRERGA 


420 




E E+ FEITRDDDATWVL G+KLE+LFVMTMWERDESIMKFARQLRGMGVDEALRERGA 




Sbjct: 


361 AETEICDFEITRDDDATWVLSGEKLERLFVMINMERDESIMKFARQLRGMGVDEALRERGA 


420 


Query: 


421 KDGDIVRIGNFEFEFVD 437 






KDGD VRIG FEFEFVD 




Sbjct: 


421 KDGDPVRIGKFEFEFVD 437 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 1319 

A DNA sequence (GBSxl399) was identified in S.agalactiae <SEQ ID 4037> which encodes the amino 
acid sequence <SEQ ID 403 8>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
25 >>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside — Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not clear) < suco 

30 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4039> which encodes the amino acid 

sequence <SEQ ID 4040>. Analysis of this protein sequence reveals the following: 

35 Possible site: 39 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

40 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 30/42 (71%) , Positives = 37/42 (87%) 

45, 

Query: 1 MAFGDNGQRKKTGFEKLTLFWILMVLVTVGGLVFGAISAIM 42 

+AFG+NG RKKT FEK+T+FWILMVLVTVGGL+ A+S +M 
Sbjct: 1 VAFGENGPRKKTTFE1WTMFWILMVLVTVGGLIASALSVLM 42 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1320 

A DNA sequence (GBSxl401) was identified in S.agalactiae <SEQ ID 4041> which encodes the amino 
acid sequence <SEQ ID 4042>. Analysis of this protein sequence reveals the following: 
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Final Results 

bacterial cytoplasm Certainty=0. 2484 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 ^QDFDNLLKIWAQLIISKGUWQKGHTLALTIDVEQVHLARLLTEAAYEKGASEVIVD 60 

MVL +F L+KYA+L+++ G+NVQ GHT+AL+IDVEQ LA LL + AY GA+EVIV 
Sbjct: 1 ITOPNFKENLEKYAKLLVTNGINVQPGHTVAIiSIDVEQAEIAHIiVKEAYALGAAEVIVQ 60 

Query: 61 YTDDFITRQRLLHASDEVLTNVPQYTVDKSLALLNKKASRLVVKSSNPNAFATVDPKRLS 12 0 

++DD I R+R LHA + VP Y + LL KKASRL V+SS+P+AF V P+RLS 
Sbjct: 61 WSDDTINRERFLHAEMNRIEEVPAYKKAEMEYLLEKKASRLGVRSSDPDAFNGVAPERLS 120 

Query: 121 ETTRATAIALEEQSRAIQANKVSWNVAAAAGREWAALVFPELKTSDQQVDALVIDTIFKLN 180 

+A A + A Q+NKVSW VAAAAG+EWA VFP + ++ VD LW+ IFK 
Sbjct: 121 AHAKAIGAAFKPMQVATQSNKVSWTVAAAAGKEKAKKVFPNASSDEEAVDLLWNQIFKTC 180 

Query: 181 R1YEDDPIAAWDAHEAKLLEKATRLNQEQFDALHYTAPGTDLTLGMPKNHIWEAAGSLNA 240 

R+YE DP+ AW H +L KA LN+ QF ALHYTAPGTDLTLG+PKNH+WE+AG+ +NA 
Sbjct: 181 RVYEKDPVRAWKEHADRLDAKARILNEAQFSALHYTAPGTDLTLGLPKNHVVIESAGAINA 240 

Query: 241 QGETFIANMPTEEIFSAPDYRRADGYVTSTKPLSYAGVIIENMTFTFKDGKIINVTAEKG 300 

QGE+F+ NMPTEE+F+APD+RRA GYV+STKPLSY G HE + TFKDG+I+++TA++G 
Sbjct: 241 QGESFLPNMPTEEVFTAPDFRRAYGYVSSTKPLSYNGNIIEGIKVTFKDGEIVDITADQG 300 

Query: 301 QEWQRLIEEIffiGARSLGEVALVPHKTPISLSGLIFFNTLFDENASNHLAlGTAYAFNVE 360 

++ ++ L+ N+GAR+LGE ALVP +PIS SG+ FFNTLFDENASNHLAIG AYA +VE 
Sbjct: 301 EKVMKNLVFNNNGARALGECALVPDSSPISQSGITFFNTLFDENASNHLAIGAAYATSVE 360 

Query: 361 GGTEMTSQELDEAGLNRSSTHVDFMIGSEQMDIDGIRADGTAVPIFRNGEWAI 413 

GG +MT +EL AGLNRS HVDF+IGS QM+IDGI DG+ VPIFRNG+W I 
Sbjct: 361 GGADMTEEELKAAGLNRSDVHVDFIIGSNQMNIDGIHHDGSRVPIFRNGDWVI 413 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1321 

A DNA sequence (GBSxl403) was identified in S.agalactiae <SEQ ID 4045> which encodes the amino 
acid sequence <SEQ ID 4046>. Analysis of this protein sequence reveals the following: 
Possible site: 33 

»> Seems to have a cleavable N-tei 
INTEGRAL Likelihood = -7.91 

Final Results 

bacterial membrane Certainty=0. 4163 (Affirmative) < suco 

bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=o. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8787> which encodes amino acid sequence <SEQ ID 8788> 
was also identified. Analysis of this protein sequence reveals the following: 
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GvH: Signal Score (-7.5): 1.01 

Possible site: 29 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 1 value: -7.91 threshold: 0.0 

INTEGRAL Likelihood = -7.91 Transmembrane 658 - 673 ( 657 - 680) 
PERIPHERAL Likelihood = 4.35 555 
modified ALOM score: 2.08 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 .4163 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

LPXTG motif: 647-651 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF09821 GB:AE001885 6 -aminohexanoate- cyclic -dimer hydrolase 
[Deinococcus radiodurans] 
Identities = 150/497 (30%) , Positives = 233/497 (46%) , Gaps = 32/497 (6%) 

Query: 110 LTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITTREQEAIEEARK 169 

LT Y + D DLA + R G++++E++ A N +LNAV+ + + +AR 

Sbjct: 45 LTPAEYDRLDALDLAQLFRRGELSAEDMCTAAIHPJVQVVWALNAVVYPLYDQGLAQARA 104 

Query: 170 L KDTNQPFLGVPLLWGLGHSIKGGETNNGLIYADGKISTFDSSYVKKYKDLG 222 

+ PF GVP LVK G + G G +1 +D V++++ G 

Sbjct: 105 TDAARARGEQATGPFAGVPFLVKDFGSRLAGVPHTGGTRAYRDQIPEWDDELVRRWQAAG 164 

Query: 223 FIILGQTNFPEYGWRNITDSKLYGLTHNPWDLAHNAGGSSGGSAAAIASGMTPIASGSDA 282 

+ LG+TN PE+ +T+ +L+G T NPWDL GGSSGGSA+A+A+G+ P+A D 
Sbjct: 165 LLPLGIOTOPEFALMGVTEPELHGPTRNPWDLGRTPGGSSGGSASAVAAGIVPLAGAGDG 224 

Query: 283 GGSIRIPSSWTGLVGLKPTRGLV- — SNEKPDSYSTAVHFPLTKSSRDAETLLTYLKKSD 339 

GGSIRIP+S GL GLKP+RG V AV LT+S RD+ LL + D 

Sbjct: 225 GGSIRIPASCCGLFGLKPSRGRVPCGDGVGEPWQGAAVEHVLTRSvRDSAALLDLEQGPD 284 

Query: 340 QTLVSV NDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVTFLRKQGFK 3 86 

+ L I ++ P+G V + A+ L G + 

Sbjct: 285 AGAALFLPSPERPYSEEVGREPGRLRIGFSTAHPLGRSVHPECVAAVQGAARLLESLGHE 344 

Query: 387 VTEIDLPIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSD 446 

V E+ LP DG AL + + L G GA +D DV+ +TW + + ++ 

Sbjct: 345 VEEVALPWDGPAIAQAFLMLYFGETGASLAALRDTLGRPARASDVEAVTWLLGQLGRSYS 404 

Query: 447 KAELKKSIMEAQKHI'IDDYRKAMEKLHKQFPIFLSPTTASIiAPLNTDPY VTEEDKRA 502 

A+ A+ + + +AM + H+ + + L+P A+ PL V RA 

Sbjct: 405 AAD FAAARASWNVHARAMGRFHQNYDLLLTPVLAT-PPLQIGELQPRGVQAJILLRA 459 

Query: 503 IYNMENLSQEERIALFNRQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMA 562 

M+ R + +L + P+TQ+AN+TG PA+S+P + + GLP+G +A 

Sbjct: 460 AQQMDVSGLLRRSGQVDALATDILEKMPYTQLANLTGQPAMSVPLHWTADGLPVGVQFVA 519 

Query: 563 GANYDMVLIKFATFFEK 579 

+ VL++ A E+ 
Sbjct: 520 PLAREDVLLRLAGQLEQ 536 

There is also homology to SEQ ID 4048. 

SEQ ID 8788 (GBS173) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 41 (lane 5; MW 96.8kDa). 

The GBS173-GST fusion product was purified (Figure 116A; see also Figure 201, lane 7) and used to 
immunise mice (lane 1+2 product; 15|ug/mouse). The resulting antiserum was used for Western blot, FACS, 
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and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1322 

A DNA sequence (GBSxl404) was identified in S.agalactiae <SEQ ID 4049> which encodes the amino 
acid sequence <SEQ ID 405O. This protein is predicted to be ribosomal large subunit pseudouridine 
synthase B (rsuA). Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

_ Final Results 

bacterial cytoplasm Certainty=0 . 3674 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06992 GB:AP001518 16S pseudouridylate synthase [Bacillus halodurans] 
Identities = 110/236 (46%), Positives = 149/236 (62%), Gaps = 4/236 (1%) 



MR+DKFL G GSR VK +LK 



61 VYYMLHKPKGVISATDDPSHKTVLDLLDKTARDKAVFPVGRLDIDTTGLLLLTNNGEIAH 120 

VY M++KPKGVI AT+D H+TV+DLL + R PVGRLD DT GLLL+XN+G+ H 

61 WLM^KPKGVICATEDLEHETVIDIIiGEEERHYEPSPVGRLDKDTVGLLLITNDGKFNH 120 

121 KMLSPKKHVDKCYEVKISGIMTEDDILAFDKG1ILKD-FTCLPALLEIVEVNQVKKQSLV 179 
++SPK HV K Y + G +TE+D+ AF G++L D + PA L I+E +S + 

jVEGHVTEEDVGAFSHGWLDDGYVTKPATLHILEAG- - -ARSHI 177 



Query: 
Sbjct 
Query: 
Sbjct 

Sbj ct: 

A related DNA sequence was identified in S.pyogenes <SEQ ID 405 1> which encodes the amino acid 
sequence <SEQ ID 4052>. Analysis of this protein sequence reveals the following: 
Possible site: 40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0152 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF09821 GB.-AE001885 6-aminohexanoate-cyclic-dimer hydrolase 
[Deinococcus radiodurans] 
Identities = 177/485 (36%) , Positives = 259/485 (52%) , Gaps = 13/485 (2%) 

Query: 5 DATAMAIAVQTGQTTPLELVTQAIYKAKKLNPTIiNAITSERFEAALEEAKQRDFSGL 61 

DA +A + G+ + ++ T AI++A+ +N IiNA+ ++ L +A+ D 4 
Sbjct: 54 DALDIAQLFRRGELSAEDMCTAAIHRAQWNVAIJIAWYPLYDQGLAQARATDAARARGE 113 

Query: 62 PFAGVPLFLKDLGQELKGHSSTSGSRLFKEYQATKTDLFVKRLEALGFIILGRSNT 117 

PFAGVP +KD G L G T G+R +++ D V+R +A G + LG++-NT 
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114 QATGPFAGVPFLVKDFGSRLAGVFHTGGTRAYREQIPEVJDDELVRRWQAAGLLPLGKTNT 173 

3FOISDSSLHGPVNLPRDNTRNAGGSSGGAAALVSSGISALATASDGGGSIRIPAS 177 

+++ IiHGP P D R GGSSGG+A+ V++GI LA A DGGGSIRIPAS 
UjMGVTEPELHGPTRNPWDLGRTPGGSSGGSASAVAAGIVPDAGAGDGGGSIRIPAS 233 

jlGLKPSRGR^IPVGPGSYRSWQGASVHFALTKSVRDTRNLLYYLQMEQMESPFPLAT 237 
j GLKPSRGR+P G G WQGA+V LT+SVRD+ LL Q + h + 



ET A A + DT GRP D+E +TW + Q G+ 



ynrSATMASFHETYDLLLTFTTNTPAPKHGELVP- - -DSKLMANLAQAEIFSSEEQF 412 
h ++ M FH+ YDLLLT TP + GEL P + L+ Q ++ + 



PYT L NLTGQPA+S+P + T +GL +G+Q +A RED+LL +A 



= 151/240 (62%) , Positives = 183/240 (75%) 

MRLDKFLVECGLGSRTQVKLILKKKQISV1SK3NSETSPKVQVDEYRDEIKYNGTLVSYEKF 6 0 
MRLDKFLV G+G+R+QVKL+LKKK I VN ETS K +DEY+D + Y GT + YE F 
MRLDKFLVATGVGTRSQVKLLLKKKAIFVNQKVETSAKAHIDEYKDLVTYQGTPLVYESF 61 



+LSPKKHV K Y K++GIMTE D F +GI LKD CLPA I 



ITI+EGKFHQVKRMVAACGKEVL+L+RL MG L+LD h G++RRLT +E++ L Y Q 



vaccines or diagnostics. 
Example 1323 

A DNA sequence (GBSxl405) was identified in S.agalactiae <SEQ ID 4053> which encodes the a 
acid sequence <SEQ ID 4054>. Analysis of this protein sequence reveals the following: 

2 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 2811 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10007> which encodes amino acid sequence <SEQ ID 
10008> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

, Gaps = 3/277 (1%) 

Query: 26 TLSOTIiNIPKlGFGTWQLTEEESaYKAVTHMjKTOYTHIDTAQIYGlEHSVGRAIRDSGL 85 

TLSN + +P+ G G WQ GE AV AL GY HIDTA IY HE SVG +R SG+ 
Sbjct: 10 TLSNGVKMPQFGLGVWQSPAGEVTENAVNWALCA.GYRHIDTAAIYKNEESVGAGLRASGV 69 

Query: 86 ARESIFLTTK1WNDKHDYHLAKASIDESLQK1X3VDYIDLLIiIHWPNPKALRENDAWKAGN 145 

RE +F+TTK+WN + Y A+ +ES QKLGVDYIDL LIHWP K + + K 
Sbjct: 70 PREDVFITTKLWNTEQGYESTLAAFEESRQKIjGVDYIDLYIjIHWPRGKDILSKEGKKY-- 127 

Query: 146 AGTWKAMEEAYKEGKVKAIGVSNFMKHHLEALFETAEIKPMVNQIIIAPGCAQEDLVRFC 205 

+W+A E+ YKE KV+AIGVSNF HHLE + + PMVNQ+ LP Q DL FC 

Sbjct: 128 LDSWRAFEQLYKEKKVRAIGVSNFHIHHLEDVIAMCTVTPMWQVELHPLNNQADLRAFC 187 







206 


KGNDILLEAYSPFGTGAIFENESIKAIAEKYGKSVAQVALRWSLDNGFLPLPKSATPKNI 265 


20 






I +EA+SP G G + N + AI KY K+ AQV LRW++ + +PKS + I 






188 


DAKQIKVEAWSPLGQGKLLSNPILSAIGAKYNKTAAQVILRWNIQKNLITIPKSVHRERI 247 




Query: 


266 


EANLDIFDFQLNEDDIATLIQLDSGIK-PKDPDNVSF 301 








E N DIFDF+L +D+ ++ L++ + DPD F 


25 


Sbjct: 


248 


EENADIFDFELGAEDVMSIDALNTNSRYGPDPDEAQF 284 



A related DNA sequence was identified in S.pyogenes <SEQ ID 779> which encodes the amino acid 

sequence <SEQ ID 780>. Analysis of this protein sequence reveals the following: 

Possible site: 27 
30 >>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0 . 0980 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 {Not Clear) < suco 

35 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 155/282 (54%) , Positives = 204/282 (71%) , Gaps = 2/282 (0%) 

Query: 20 IVMErYTLSNTLNIPKIGFGTWQLTEGEEAYKAVTHALKVGYTHIDTAQIYGNEHSVGRA 79 

+++ T +++ IP +GFGT+Q +GEEAY++ A+K GY HIDTA IY NE SVGRA 
Sbjct: 1 VMVTTVKMTSGyEIPvLGFGTYQAADGEEAYQSTIAAIKAGYRHIDTAAIYKNEESVGRA 60 

Query: 80 IRDSGLARESIFLTTKIWNDKHDYH1AKASIDESLQKLGVDYIDLLLIHWPNPKALREND 139 

I+DSG+ RE +F+TTK+WND H Y AK ++ SI> +LG+DY+DL LIHWPNPKALR + 
Sbjct: 61 IKDSGvI i REDLFITTKLWNDAHSYEGAKDAIiAASLDRLGLDYVDLYIjIHWPNPKALR--N 118 

Query: 140 AWKAGNAGTWKAMEEAYKEGKVKAIGVSNFM^HLEALFETAEIKPMVNQIILAPGCAQE 199 

WK NA W+ MEEA + G +K+IGVSNFM HHLEAL ETA+I P +NQI LAPGC Q+ 
Sbjct: 119 TWKEANAQAWQYMEEAVEAGLI KS IGVSNFMVHKLEALQETAKITPAINQIRLAPGCYQK 178 

Query: 200 DLVRFCKGNDILLEAYSPFGTGAIFENESIKAIAEKyGKSVAQVALRWSLDNGFLPLPKS 259 

++V +CK N+ILLEA+SP G G IF+NE+++ +A KY K+VAQVAL WSL GF+PLPKS 
Sbjct: 179 E WDYCKANE I LLEAWSPLGQGEI FDNETMCflLANKYDKTVAQVALAWSLAEGFI PLPKS 238 

Query: 260 ATPKNIEANLDIFDFQLNEDDIATLIQLDSGIKPKDPDNVSF 301 

+ 1+ N+ IFD L ++D T+ L +PD SF 

Sbjct: 239 VHDERIKENMAIFDVSLTQEDKKTIRYLSGMSAIPNPDTTSF 280 



60 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1324 

A DNA sequence (GBSxl406) was identified in S.agalactiae <SEQ ID 4055> which encodes the amino 
acid sequence <SEQ ID 4056>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0633 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10009> which encodes amino acid sequence <SEQ ID 
1001O was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12612 6B:Z99108 similar to NAD (P) H- flavin oxidoreductase 
[Bacillus subtilis] 
Identities = 105/223 (47%) , Positives = 150/223 (66%) , Gaps = 8/223 (3%) 

Query: 29 DIKICQVRRAFDFRMAIRVYN-NNDIPKEDMEYILDTAWLSPSSVGLEGWRFLVLDRQTIA 87 

D+K Q+ A++FR A + ++ N + D E+IL+T LSPSS+GLE W+F+V+ 
Sbjct: 3 DLKTQILDAYNFRHATKEFDPNKKVSDSDFEFILETGRLSPSSLGLEPWKFVWQNP 59 

Query: 88 KFRDKLKEVAWGAQYQLDTASHFVLLLAE--KGAYYNADSMINSLIRRGLGDPAALESRI 145 

+FR+KL+E WGAQ QL TASHFVL+LA K YNAD + L E + 

Sbjct: 60 EFREKLREYTWGAQKQLPTASH FVL I IARTAKD I KYNADYI KRHLKEVKQMPQDVYEGYL 119 

Query: 146 PLYKS FQENDMKI - DSERSLWDOTAKQTYIALGNMMTAAAM I GVDSCP IEGFDYEKVNNI 204 

+ FQ+ND+ + +S+R+L+DW +KQTYIALGNMMTAAA IGVDSCPIEGF Y+ ++ I 
Sbjct: 120 SKTEEFQKNDLHLLESDRTLFDl-7ASKQTYIALGNMMTAaAQIGVDSCPIEGFQYDHlHRI 179 

Query: 205 LSXWGLIDDKKEAISCMVSFGYRLREPKHSRARKERQEVITWV 247 

L +EGL+++ IS MV+FGYR+R+P+ + R ++V+ WV 
Sbjct: 180 LEEEGLLENGSFDISVMVAFGYRVRDPR-PKTRSAVEDWKWV 221 

A related DNA sequence was identified in S.pyogenes <SEQ ID 405 7> which encodes the amino acid 
sequence <SEQ ID 4058>. Analysis of this protein sequence reveals the following: 
Possible site: 47 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1705 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 126/222 (56%) , Positives = 174/222 (77%) , Gaps - 4/222 (1%) 

Query: 28 EDIKKQVRRAFDFRMAIRVYNNNDIPKEDMEYILDTAWLSPSSVGLEGWRFIjVLDRQTIA 87 

+ I Q+++A FR A+RVY I ED+ ILD AWLSPSS+GLEGWRF+VLD + I 
Sbjct: 3 QTIHHQIQQALHFRTATOVYKEEKISDEDLALILDAAWLSPSSIGLEGWRFVVLDNKPI- 61 

Query: 88 KFRDKLKEVAWGAQYQLDTASHFVLLLAEKGAYYNADSMINSLIRRGLGDPAALESRIPL 147 

++++K AWGAQYQL+TASHF+LL+AEK A Y++ ++ NSL+RRG+ + L SR+ L 
Sbjct: 62 --KEEIKPFAWGAQYQLETASHFILLIAEKHARYDSPAIKNSLLRRGIKEGDGHSISRLKL 119 

Query: 148 YKSFQElTOMKI-DSERSLTOWTAKQlYIALGNMMTAAaMIGvDSCPIEGFDYEKVNNILS 206 

Y+SFQ+ DM + D+ R+L+DWTAKQTYIALGNMM AA++G+D+CPIEGF Y+KVN+IL+ 
Sbjct: 120 YESFQKED^MADNPRALFDWTAKQTYIAI^KIWTAALLGIDTCPIEGFHYDKVNHIIA 179 
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Query: 207 KEGLIDDKKEAISCMVSFGYRIjREPKHSRARKERQEVITWVE 248 

K +ID +KE 1+ M+S GYRLR+PKH++ RK ++EVI+ V+ 
Sbjct: 180 KHNVIDLEKEGIASMLSLGYRLRDPKHAQVRKPKEEVISWK 221 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1325 

A DNA sequence (GBSxl407) was identified in S.agalactiae <SEQ ID 405 9> which encodes the amino 
acid sequence <SEQ ID 406O. This protein is predicted to be lactoylglutathione lyase (gloA). Analysis of 
this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1656 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC21986 GB:U32717 lactoylglutathione lyase (gloA) [Haemophilus influenzae Rd] 
Identities = 59/131 (45%) , Positives = 86/131 (65%) , Gaps = 2/131 (1%) 

Query: 1 MPFLHTCIRVKDLDASIAFYQEALGFKEVRRNDFPENQPTLVYMALEDDPSY-ELELTVN 59 

M LHT +RV DLD SI FYQ+ LG + +R ++ PE ++TL ++ ED S E+ELTYN 
Sbjct: 1 MQILHTMLRVGDLDRSIKFYQDV1GMRLIJITSENPEYOTLAFLGYEDGESAAEIELTYN 60 

Query: 60 YDHEAYDLGNGYGHIAVGVDDLETTYDAHQKAGYSVTKISG-LPGKPNMPYFIQDPDGYK 118 

+ + Y+ G YGHIA+GVDD+ T +A + +G +VT+ +G + G + F++DPDGYK 
Sbjct: 61 WGVDKYEHGTAYGHIAIGVDDIYATCEAWASGGNVTREAGPVKGGSTV1AFVEDPDGYK 120' 

Query: 119 IEVIRLSQFKA 129 

IE I K+ 
Sbjct: 121 IEFIENKSTKS 131 

A related DNA sequence was identified in S.pyogenes <SEQ ID 406 1> which encodes the amino acid 
sequence <SEQ ID 4062>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 13 82 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 80/125 (64%) , Positives = 93/125 (74%) , Gaps = 1/125 (0%) 

Query: 1 MPFLHTCIRVKDLDASIABYQEALGFKEVRRl^FPENQFTLVYMALEDDPSYELELTYNY 60 

M LHTCIRVKDLD S+AFY A FKE R DFP++QFTLVY+ALE + SYELELTYNY 
Sbjct: 1 MKALHTCIRVKDIiDQS VAFYTSAFPFKENYRKDFPDSQFTLVYLALEGE - S YELELTYNY 59 

Query: 61 DHFAYDLGNGYGHIAVGVDDLETTYDAHQKAGYSVTKISGLPGKPNMFYFIQDPDGYKIE 120 

H YDLGNGYGHIA+G + E + H++AG+ VT ILK +YFIQDPDGYKIE 
Sbjct: 60 GHGDYDLGNGYGHIALGSEHFEADHKKHRQAGFPVTDIKELADKSARYYFIQDPDGYKIE 119 

Query: 121 VIRLS 125 

VI L+ 
Sbjct: 120 VIDUJ 124 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1326 

A DNA sequence (GBSxl408) was identified in S.agalactiae <SEQ ID 4063> which encodes the amino 
acid sequence <SEQ ID 4064>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.02 Transmembrane 241 - 257 ( 229 - 262) 
INTEGRAL Likelihood = -4.94 Transmembrane 270 - 286 ( 264 - 287) 



Final Results 

bacterial membrane Certainty=0 .4609 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

3 response 

Query: 3 LLSVIVPCTNEQETVSTFLTEIKKVESEMARYTHFEYIFVNDGSTDRTLELLKKAAKQFD 62 

L+S+I+P YNE V +KK E + Y +E F+NDGS D TL+ +K A 

Sbjct: 5 LISIIIPSYNEGYNVKLIHESLKK-EFKNIHYD-YE1FFINDGSVDDTLQQIKDLAATCS 62 

Query: 63 NVHYLSFSRIIFGKDAALIAGLEHTTGDFITVMDVDLQDPPTLLPEMYLKLQEGYDIVATR 122 

V Y+SFSR+FGK+AA+LAG EH G+ + VMD DLQ P LL E +EGYD V + 

Sbjct: 63 RVKYI S FSRNFGKEAAI LAGFEHVQGEAVI VMDADLQHPTYLLKF.FI KGYF.EGYDQVIAQ 122 

Query: 123 RKDRKBEPLIRSLFAKLFYKLINQVSDTKMVDGARDFRLMTKQWDSILEIjNEvNRFSKG 182 

R +RKG+ +RSL + ++YK IN+. + + DG DFRL+++Q V+++L+L+E NRFSKG 
Sbjct: 123 R-ITOKGDSFWSLLSSMYYKFINKAVEVDLRD3VGDFRLLSRQAVNALLKLSEGNRFSKG 181 

Query: 183 IFSWIGYDVAYISYENRERIAGKTSWSFFNLLKYSLDGFINFSEIPLAIATWIGTLSSVL 242 

+F WIG+D + YEN ER G 4 WSF +L Y +DG ++F+ PL + + G +L 
Sbjct: 182 LFCWIGFDQKIVFYENVERKNGTSKWSFSSLFNYGMDGWSFNHKPLRLCFYTGIFILLL 241 

Query: 243 SLLAIIFIIIRKLLFGDPVSGWASTVTIVLFMGGIQLLSLGIIGKYISKIFLETKKRPVY 302 

S++ II ++ L G V G+ + ++ VLF+GG+QLLSLGIIG+YI +1+ ETKKRP Y 
Sbjct: 242 S 1 1 YI IATFVKILTNGI SVPGYFTI ISAVLFL33VQLLSLGI IGEYIGRI YYETKKRPHY 301 

Query: 303 IVKE 306 

Sbjct: 302 LIKE 305 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4065> which encodes t 
sequence <SEQ ID 4066>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.55 Transmembrane 256 - 272 ( 251 - 282) 
INTEGRAL Likelihood = -5.31 Transmembrane 290 - 306 ( 284 - 307) 



- Final Results 

bacterial membrane Certainty=0. 4821 (Affirmative) 

bacterial outside --- Certainty=0 . 0000 (Not Clear) ■ 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) ■ 



A related sequence was also identified in GAS <SEQ ID 9113> which encodes the amino acid sequence 
<SEQ ID 91 14>. Analysis of this protein sequence reveals the following: 
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Final Results 

bacterial membrane Certainty= 0.482 (Affirmative) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0 . 000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 207/307 (67%) , Positives = 258/307 (83%) 

Query: 1 MALLSVIVPCYNEQEWSTFLTEIKKVESEMARYTHFEYIFuNDGSTDRTLELLKKAAKQ 60 

M LLS+IVPC+NE+ + + E+ + +E+ M FEYIF++DGS D TL +L++ A + 

Sbjct: 21 MTLLSIIVPCFNEEANILPYFEEMHQLETSMTNQLAFEYIFIDDGSKDNTLGILRELAAR 80 

Query: 61 FDNVHYLSFSRHFGKDAALLAGLEHTTGDFITVMDVDLQDPPTLLPEMYLKLQEGYDIVA 120 

F NVHYLSFSRHFGK+A LLAGL+ G++ITVMDVDLQDPP LLP MY KL+EGYDIV 
Sbjct: 81 FPNVHYLSFSRHFGKEAGLLAGLKFJ^GmiTOvlDVDLQDPPELLPI^AKLKEGYDIVG 140 

Query: 121 TRRKDRKGEPLIRSLFAKLFYKLINQVSDTKM\'DGARD?RLMTKQVvDSILELI)3EVNRFS 180 

TRR++R+GEPLIRS+ + LFY LI +SDT+MV+G RD+RLMT+QWDSILEL EVNRFS 
Sbjct: 141 TRRQNRQGEPLIRSMCSNLFYGLIKHLSDTEMVNGVRDYRLMTRQWDSILELGEVNRFS 20D 

Query: 181 KGI FSWIGYDVAYI SYENRERIAGKTSWSFFNLLKYSLDGFINFSEI PLAIATWIGTLSS 240 

KGIFSW+GY + Y+S+EN++R GK-f W F+ LL+YSLDGFINFSE+PL IATW GT S 
Sbjct: 201 KGIFSWVGYRITYLSFENQKRKYGKSRWHFWELLRYSLDGFINFSEMPLTIATWTGTFSF 260 

Query: 241 VLSLMIIFIIIRKLLFGDPVSGmSTVTIVLFMCSGIQIiLSLGIIGKYISKIFLETKKRP 300 

++S+ AI+FIIIRK+LFGDPVSGWASTV+I+LFMGGIQL +GI IGKYI SKI FLETKKRP 
Sbjct: 261 LISIFAILFIIIRKILFGDPVSGWASTVSIILFMGGIQLFCMGIIGKYISKI FLETKKRP 320 

Query: 301 VYIVKEE 307 

+YI+KE+ 
Sbjct: 321 LYIIKEK 327 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1327 

A DNA sequence (GBSxl409) was identified in S.agalactiae <SEQ ID 4067> which encodes the amino 
acid sequence <SEQ ID 4068>. This protein is predicted to be d-serine/d-alanine/glycine transporter (cycA). 
Analysis of this protein sequence reveals the following: 
Possible site: 49 

»> Seems to have no N-terminal signal sequence 

Likelihood = -2.44 Transmembrane 50 - 66 ( 50 - 66) 
Likelihood = -1.49 Transmembrane 27 - 43 ( 27 - 43) 

Final Results 

bacterial membrane Certainty=0. 1977 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA83253 GB:Z31377 potential amino acid permease 
[Lactobacillus delbrueckii] 
Identities = 34/55 (61%) , Positives = 44/55 (79%) 

Query: 7 DHTQKSENGMVRGLENRHVQLIAIAGTIGTGLFLGAGRSISLTGPSIVLVYAITG 61 

D + ++ +G +R L NRHVQ+IAI GTIGTGLFLGAG +IS TGPS++ +YAI G 
Sbjct: 5 DRSIENTDGTIRSLSNRHVQMIAIG3TIGTGLFLGAGTTISATGPSVIFIYAIMG 59 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4069> which encodes the amino acid 
sequence <SEQ ID 4070>. Analysis of this protein sequence reveals the following: 



INTEGRAL 



Possible site: 53 
•> Seems to have no N- terminal 
INTEGRAL Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
INTEGRAL Likelihood 



dgnal sequence 



INTEGRAL 



33 



- Final Results 

bacterial membrane • 

bacterial outside • 

bacterial cytoplasm - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



Transmembrane 109 - 



186 ( 161 - 

272 ( 252 - 

368 ( 347 - 

155 ( 133 - 

436 ( 417 - 

72 ( 54 - 

299 ( 282 - 

456 ( 439 - 

47 ( 31 - 

125 ( 109 - 



-- Certainty=0 . 5458 (Affirmative) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 
- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 



Query: 12 DNNELENGMVRGLENRHVQLIAIAGTIGTGLFLGAGRSIALTGPSIIFVYMITGAFMFMM 71 

DN + + RGL+NRH+QL+AI G IGTGLFLG+G+SI GPSI+F Y+ITG F F + 
Sbjct: 8 DNFGQQQKLSRGLKNRHIQLMAIGGAIGTGLFLGSGKSIHFAGPSILFAYLITGVFCFFI 67 

Query: 72 MRAIGEMLYYDPDQHTFINFISKYIGPGWGYFSGLSYWISLIFIGMAEITAVGAYVQFWF 131 

+R++GE+L + H+F++F+ Y+G + +G +YW I + MA++TAVG Y Q+W 
Sbjct: 68 IRSLGELLLSNAGYHSFVDFVIUDYLGNMAAFITGWrYWFCWISIiAMADLTAVGIYTQYWL 127 

Query: 132 PSWPAVttlQLVFLVLLSSINLIAWVFGETEFWFAMIKILAILALIATAIFMVLTGFETH 191 

P P WL L+ L++L +NL V++FGE EFWFA+IK++AILALI T I ++ GF 
Sbjct: 128 PDVPQWLPGLLALIILLIMNLATVKljFGELEFWFALIKVIAILALIVTGILLIAKGFSAA 187 

Query: 192 TGHASLSNIFDHFSMFPNGKLKFFMAFQMVFFAYQAIEFVGITTSETANPRKVLPKAIQE 251 

+G ASL+N++ H MFPNG F ++FQMV FA+ IE VG+T ET NP+KV+PKAI + 
Sbjct: 188 SGPASKTOLWSHGGMFPNGWHGFILSFQMWFAFVGIELVGLTAGETENPQKVIPKAINQ 247 

Query: 252 IPTRIVIFYVGALVSIMAIVPSfflQLPVDESPFVMVFKLIGIKWAAALINFWLTSAASAL 311 

IP RI4+FYVGAL IM I PW+ L +ESPFV VF +GI AA+LINFWLTSAASA 
Sbjct: 248 IPTOILLFYVGALFVIMCIYPWNVLNPNESPFVQVFSAVGIWAASLINFVVLTSAASAA 307 

Query: 312 NSTLYSTGRHLYQIANE--TPNALTNRLKINTLSRQGVPSRAIIASAWVGISALINILP 369 
NS L+ST R +Y 4-A + P L L+ VPS A+ S++ + 1 +N L 

- -KKLTSSNVPSNALFFSSIAILIGVSLNYLM 3 61 



Query: 370 GVAnAFSLITASSSGVYIAIYALTMIAHWKYRQSK- -DFMADGYLMPKYKVTTPLTLAFF 427 

F+LIT+ S+ +1 1+ +T+I H KYR+++ + A+ + MP Y ++ LTLAF 
Sbjct: 362 -PEQWTLITSVSTICFIFIWGITVICHLKYT<KTRQHEAKANKFKMPFYPLSNYLTLAFL 420 

Query: 428 AFVFISLFLQESTYIGAIGATIWIIIFGIYSNVK 461 

AF+ + L L ' T I +W ++ I ' V+ 

Sbjct: 421 AFILVILALANDTRIALFVTPVWFVLLIILYKVQ 454 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 48/62 (77%) , Positives = 51/62 (81%) 

Query: 1 MSKNNNDHTQKSENGMVRGLENRITVQLIAIAGTIGTGLFLGAGRSISLTGPSIVLVYAITGA 62 

MS + ENGMVRGLENRHVQLIAIAGTIGTGLFLGAGRSI+LTGPSI+ VY ITGA 

Sbjct: 5 MSIKEQTDNNELENGMVRGLENRHVQLIAIAGTIGTGLFLSAGRSIALTGPSIIFVYMITGA 66 



WO 02/34771 



-1463- 



PCT/GB01/04789 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1328 

A DNA sequence (GBSxl411) was identified in S.agalactiae <SEQ ID 4071> which encodes the amino 
5 acid sequence <SEQ ID 4072>. This protein is predicted to be alkylphosphonate uptake protein (phnA). 
Analysis of this protein sequence reveals the following: 

Possible site: 29 

>» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 0965 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC77069 GB.-AE000483 orf, hypothetical protein [Escherichia coli K12] 
Identities = 79/110 (71%) , Positives = 91/110 (81%) , Gaps = 1/110 (0%) 

Query: 1 MSLPNCPKCNSEYVYEDGILLVCPECAYEWNPEE-IEEEVGLIVLDSNGTRIjSDGDTvTV 59 
20 MSLP+CPKCNSEY YED + +CPECAYEWN B +E LIV D+NG L+DGD+VT+ 

Sbjct: 1 MSLPHCPKCMSEYTYEDNGmiCPECAYEWNDAEPAQESDELIVKDANGNLLADGDSVTI 60 

Query: 60 IKDLKVKGAPIODIKQGTRVKNIRIiVDGDHNIDCKIDGFGAMKLKSEFVKK 109 
IKDLKVKG+ +K GT+VKNIRLV+GDHNIDCKIDGPG MKLKSEFVKK 
25 Sbjct: 61 IKDLKVKGSSSMLKIGTKVKNIRLVEGDHNIDCKIDGFGPMKLKSEFVKK 110 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4073> which encodes the amino acid 
sequence <SEQ ID 4074>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
30 >» Seems to have no N-terminal signal sequence 

Final Results 

, . bacterial cytoplasm --- Certainty=0 . 3428 (Affirmative) < suco 

bacterial membrane -— Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 73/85 (85%), Positives = 79/85 (92%), Gaps = 1/85 (1%) 

40 Query: 26 CAYEWNP - EE I EEEVGL I VLDSNGTRIiSDGDTVTVI KDLKVKGAPKDI KQGTRVKNI RLV 84 

CA+EW P EE EE GL+VLDSNG RLSDGDT+TV+HDLKVKGAPKD+KQGTRVKNIRLV 
Sbjct: 2 CAFEWTPGEEATEEEGLWLDSNGTOIjSrJGDTITVvTaDLKVI<GAPKDLKQGTRVKNIRIiV 61 

. Query: 85 DGDHNIDCKIDGFGAMKLKSEFVKK 109 
45 +GDHNIDCKIDGFGAMKLKSEFVKK' 

Sbjct: 62 EGDHNIDCKIDGFGAMKLKSEFVKK 86 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 1329 

A DNA sequence (GBSxl412) was identified in S.agalactiae <SEQ ID 4075> which encodes the amino 
acid sequence <SEQ ID 4076>. Analysis of this protein sequence reveals the following: , 

Possible site: 22 
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>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3665 (Affirmative) < suco 

5 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 500. 

10 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1330 

A DNA sequence (GBSxl414) was identified in S.agalactiae <SEQ ID 4077> which encodes the amino 
acid sequence <SEQ ID 4078>. Analysis of this protein sequence reveals the following: 

15 Possible site: 13 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.11 Transmembrane 558 - 574 ( 558 - 574) 

Final Results 

20 bacterial membrane Certainty=0 . 1044 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11971 GB:Z99105 L-glutamine-D-fructose-6-phosphate 

amidotransferase [Bacillus subtilis] 
' Identities = 355/604 (58%), Positives = 445/604 (72%), Gaps = 4/604 (0%) 

Query: 

Sbjct: 

Query: 61 SVSGTTGIGHTRWATHC3KPTEGNAHPHTSGSGRFVLVHNGVIENYLQIBCETYLTKHNLKG 120 

+V GIGHTRWATHG+P+ NAHPH S GRF LVHNGVIENY+Q+K+ YL LK 
Sbjct: 61 NVEAKAGIGHTRWATHGEPSYLNAHPHQSALGRFTLVHNGVIENYVQLKQEYLQDVELKS 120 

Query: 121 ETDTEIAIHLVEHFVEEDNLSVLEAFKKALHIIEGSYAFALIDSQDADTIYVAKNKSPLL 180 

+TDTE+ + ++E FV L EAF+K L +++GSYA AL D+ + +TI+VAKNKSPLL 
Sbjct: 121 DTDTEVWQVIEQFVN-GGLETEEAFRKTLTLLKGSYAIALFDNDNRETIFVAKNKSPLL 179 

Query: 181 IGLGNGYNMVCSDAMAMIRETSEYI^IEIHDKELVIVKKDSVEVQDYDGNVIERGSYTAELD 240 

+GLG+ +N+V SDAMAM++ T+EY+E+ DKE+VIV D V +++ DG+VI R SY AELD 
Sbjct: 180 VGLGOTFNWASDAMAMLQVTNEYvELTOKEMVIVTDDQWIKKLDGDVITRASYIAELD 239 

Query: 241 LSDIGKGTYPFYMLKEIDEQPTVMRKLISTYANESGDMNVDSDIIKSVQEADRLYILAAG 300 

SDI KGTYP YMLKE DEQP VMRK+I TY +E+G ++V DI +V EADR+YI+ G 
Sbjct: 240 ASDIEKGTYPHYMLKETDEQPWMRKI IQTYQDENGKLSVPGDIAAAVAEADRIYI IGCG 299 

Query: 301 TSYHAGFAAKTMIEKIiTDTPVELGVSSEWGYNMPLIiSKfCPlVIFILLSQSGETADSRQVLVK 360 

TSYHAG K IE + PVE+ V+SE+ YNMPLLSKKP+FI LSQSGETADSR VLV+ 
Sbjct: 300 TSYHAGLVGKQYIEMWANVPVEVH'i/ASEFSYNt.lPLLSKKPLFIFLSQSGETADSRAVLVQ 359 

Query: 361 ANEMGIPSLTITWPGSTLSREATYT^IHAGPEIAVASTKAYTAQVATIAFIAKAVGEA 420 

+G +LTITNVPGSTLSREA YT+L+HAGPEIAVASTKAYTAQ+A LA LA + 
Sbjct: 3 60 VKALGHKALTITWPGSTLSREADYTLLIjHAGPEIAVASTKAYTAQIAvLAVLASVAADK 419 

Query: 421 NGKAFAKDFDLvHELSIVAQSIEATLSEKDVISEKVEQLLISTRNAFYIGRGNDYYVTME 480 

NG FDLV EL I A ++EA +KD + + L +RNAF+IGRG DY+V +E 

Sbjct: 420 NGINI G - - FDLVKELG I AANAMEALCDQKDEMEMI AREYLTVSRNAF F I GRGLD YFVCVE 477 
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Query: 481 AALKLKEISYIQTEGFAAGELKHGTISLIEDNTPVIALISADSTIAAHTRGNIQEWSRG 540 

ALKLKEISYIQ EGFA GELKHGTI+L1E TPV AL + + + RGN++EV +RG 
Sbjct: 478 GaLKLKEISYIQflEGFAGGELKHGTIALIEQGTPVFALATQEH-VNLSIRGNVKEVAARG 536 

Query: 541 ANALIIVEEGLEREGDDIIVNKVHPFLSAISM\n:PTQLIAYYASLQRGLDVDKPRNLAKA 600 

AN II +GL+ . D ++ +V+P L+ + V+P QLIAYYA+L RG DVDKPRNLAK+ 
Sbjct: 537 ANTCIISLKGLDDADDRFVLPEVNPAIAPLVSWPLQLIAYYAALHRGCDVDKPRMLAKS 596 

Query: 601 VTVE 604 
VTVE 

Sbjct: 597 VTVE 500 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4079> which encodes the amino acid 
sequence <SEQ ID 4080>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.06 Transmembrane 558 - 574 ( 558 - 574) 



Final Results 

bacterial membrane --- Certainty=0 . 1426 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certaxnty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB11971 GB:Z99105 L-glutamine-D-f ructose-6 -phosphate 
amidotransferase [Bacillus subtilis] 
Identities = 353/604 (58%) , Positives = 445/604 (73%) , Gaps = 4/604 (0%) 

Query: 1 MCGIVGWGNRNATDILMQGLEKLEYRGYDSAGIFVANANQTNLIKSVGRIADLRAKIGI 60 

MCGIVG +G +A + IL++GLEKLEYRGYDSAGI VAN ++ K GRIADLR + 
Sbjct: 1 MCGIVGYIGQLDAKEILLKGLEKLEYRGYDSAGIAVANEOSIHvFKEKGRIADLREVVDA" 60 

Query: 61 DVAGSTGIGHTRWATHGQSTEDNAHPHTSQTGRFVLVHNGVIENYLHIKTEFLAGHDFKG 120 

+V GIGHTRWATHG+ + NAHPH S GRF LVHNGVIENY+ +K E+L + K 
Sbjct: 61 NVEAKAGIGHTRWATHGEPSYLNAHPHQSALGRFTLVHNGVIENYVQLKQEYLQDVELKS 120 

Query: 121 QTDTEIAVHLIGKFVEEDKLSVLEAFKKSLSIIEGSYAFALMDSQATDTIYVAKNKSPLL 180 

TDTE+ V +1 +FV L EAF+K+L++++GSYA AL D+ +TI+VAKNKSPLL 
Sbjct: 121 DTDTEVWQVIEQFVNGG-LETEEAFRKTLTLLKGSYAIALFDNDNRETIFVAKNKSPLL 179 

Query: 181 IGLGEGYIWCSDAMAMIRETSEFMEIHDKELVILTKDKVTVTDYDGKELIRDSYTAELD 240 

+GLG+ +N+V SDAMAM+4- T+E++E+ DKE+VI+T D+V + + DG + R SY AELD 
Sbjct: 180 VGLGDTFNWASDAMAMLQVTNEYVELMDKEMVIVTDDQWIKNLDGDVITRASYIAELD 239 

Query: 241 LSDIGKGTYPFYMLKEIDEQPTVMRQLISTYADETGNVQVDPAIITSIQEADRLYILAAG 300 

SDI KGTYP YMLKE DEQP VMR++I TY DE G + V 14+ EADR+YI+ G 
Sbjct: 240 ASDIEKGTYPHYMLKETDEQPWMRKIIQTYQDENGKLSVPGDIAAAVAEADRIYIIGCG 299 

Query: 301 TSYHAGFATKNMLEQLTDTPVELGVAS3WGYHMPLLSKKPMFILLSQSGETADSRQVLVK 360 

TSYHAG K +E + PVE+ VASE+ Y+MPLLSKKP+FI LSQSGETADSR VLV+ 
Sbjct: 300 TSYHAGLVGKQYIEMWANVPVEVHVASEFSYNMPLLSKKPLFIFLSQSGETADSRAVLVQ 359 

Query: 361 ANAMGIPSLTVTNVPGSTLSREATYTMLIHAGPEIAVASTt^YTAQIAALAFLAKAVGEA 420 

A+G +LT+TNVPGSTLSREA YT+L+HAGPEIAVASTKAYTAQIA LA LA + 
Sbjct: 360 VKALGHKALTITNVPGSTLSREADYTLLDHAGPEIAVASTKAYTAQIAVLAVLASVAADK 419 

Query: 421 NGKQEALDFNLVHELSLVAQSIEATLSEKDLVAEICVQALLATTRNAFYIGRGNDYYVAME 480 

NG + F+LV EL + A ++EA +KD + + L +RNAF+IGRG DY+V +E 
Sbjct: 420 NGIN--IGFDLVKELGIAANAMEALCDQKDEMEMIAREYLTVSRNAFFIGRGLDYFVCVE 477 

Query: 481 AALKLKE I S YI QCEGFAAGELKHGT I SLIEEDTPVI ALI SSSQLVASHTRGNIQEVAARG 540 

ALKLKEISYIQ EGFA GELKHGTI +LIE+ TPV AL + + S RGN++EVAARG 
Sbjct: 478 GALKLKEISYIQAEGFAGGELKHGTIALIEQGTPVFAIATQEHVNLS - IRGNVKEVAARG 536 

Query: 541 AHVLTvVEEGLDREGDDIIWKVHPFIAPIAMVIPTQLIAYYASLQRGLDVDKPRNLAKA 600 



WO 02/34771 



PCT/GB01/04789 



A+ + +GLD D ++ +V+P IAP+ V+P QLIAYYA+L RG DVDKPRNIAK+ 
Sbjct: 537 ANTCIISLKGLDDADDRFVLPETOPAI^LVSWPLQLIAYYARLHRGCDVDKPRNLAKS 596 

Query: 601 VTVE 604 

Sbjct: 597 VTVE 600 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 500/604 (82%) , Positives = 552/604 (90%) 

Query: 1 MCGIVGWGNTNATDILIQGLEKLEYRGYESAGIFWGDNKSQLVKSVGRIAEIQAKVGD 60 

MCGIVGWGN NATDIL+QGLEKLEYRGYDSAGIFV N++ L+ KSVGRIA+ + +AK+G 
Sbjct: 1 MCGIVGWGNRNATDILMQGLEKLEYRGYESAGIFVANANQTNLIKSVGRIADLRAKIGI 60 

Query: 61 SVSGTTGIGHTRWATHGKPTEGNAHPHTSGSGRFVLVHNGVIENYLQIKETYLTKHNLKG 120 

V+G4TGIGHTRWATHG+ TE NAHPHTS +GRFVLVHNGVIENYL IK +L ■ H+ KG 
Sbjct: 61 DVAGSTGIGHTRWATHGQSTEDNAHPHTSQTGRFVLVIEIGVIENYLHIKTEFLAGHDFKG 120 

Query; 121 ETDTEIAIHLWHEVEEDNLSVLEAFKKALHIIEGSYAFALIDSQDADTIYVAKNKSPLL 180 

+TDTEIA+HL+ FVEED LSVLEAFKK+L IIEGSYAFAL+DSQ DTIYVAKNKSPLL 
Sbjct: 121 QTDTEIAVHLIGKFVEEDKLSVLEAFKKSLSIIEGSYAFALMDSQATDTIYVAKNKSPLL 180 

Query: 181 IGLGNGYNMVCSDAMAMIRETSEYMEIHDKELVIVKKDSVEVQDYDGNVIERGSYTAELD 240 

IGLG GYMMVCSDAMAMIRETSE+MEIHDKELVI+ KD V V DYDG + R SYTAELD 
Sbjct: 181 IGLGEGYW^CSDAMAMIRETSEFMEIHDKELVILTKDKVTVTDYDGKELIRDSYTAELD 240 

Query: 241 LSDIGKGTYPFYMLKEIDEQP0TORKLIS1YANESGDMNVDSDIIKSVQEADRLYILAAG 300 
LSDIGKGTYPFYMLKEIDEQPTVMR+LISTYA+E+G++ VD II S+QEADRLYILAAG 
. Sbjct: 241 LSDIGKGTYPFYMLKEIDEQPTVMRQLISTYADETGNVQVDPAIITSIQEADRLYILAAG 300 

Query: 301 TSYHAGFAAKTMIEKLTDTPVELGVSSEWGYNMPLLSKKPMFILLSQSGETADSRQVLVK 360 

TSYHAGFA K M+E+LTDTPVELGV+SEWGY+MPLLSKKPMFILLSQSGETADSRQVLVK 
Sbjct: 301 TSYHAGFATKNMLEQLTDTPVELGVASEWGYHMPLLSKKPMFILLSQSGETADSRQVLVK 360 

Query: 3 61 ANEMGIPSLTITIWPGSTLSREATYTMLIHAGPEIAVASTKAYTAQVATI1AFI1AKAVGEA 420 

AN MGIPSLT+TNVPGSTLSREATYTMLIHAGPEIAVASTKAYTAQ+A LAFLAKAVGEA 
Sbjct: 361 ANAMGIPSLTVTWPGSTLSRFATYTMIiIHAGPEIAVASTKAYTAQIAALAFLAKAVGEA 420 

Query: 421 NGKAEAKDFDLVHELSIVAQSIEATIjSEKDVISEKVEQLLISTRNAFYIGRGNDYYVTME 480 

NGK EA DF+LVHELS+VAQSIEATLSEKD+++EKV+ LL +TRNAFYIGRGNDYYV ME 
Sbjct: 421 NGKQEALDFNLVHELSLVAQSIEATLSEKDLVAEKVQALLATTRNAFYIGRGNDYYVAME 480 

Query: 481 AALKLKE I S YI QTEGFAAGELKHGTI SLIEDNTPVI AL I SADST I AAHTRGNIQEWSRG 540 

AALKLKEISYIQ EGFAAGELKHGTISLIE++TPVIALIS+ +A+HTRGNIQEV +RG 
Sbjct: 481 AALKLKEISYIQCEGFAAGELKHGTISLIEEDTPVIALISSSQLVASHTRGNIQEVAARG 540 

Query: 541 ANALI IVEEGLEREGDDI IVNKVHPFLSAI SMVI PTQLIAYYASLQRGLDVDKPRNLAKA 600 

A+ L +VEEGL+REGDDIIVNKVHPFL+ I+MVIPTQLIAYYASLQRGLDVDKPRNLAKA 
Sbjct: 541 AHVLTWEEGLDREGDDIIVNKVHPFEAPIAMVIPTQLIAYYASLQRGLDVDKPRNLAKA 600 

Query: 601 VTVE 604 

Sbjct: 601 VTVE 604 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1331 

A DNA sequence (GBSxl415) was identified in S.agalactiae <SEQ ID 4081> which encodes the amino 
acid sequence <SEQ ID 4082>. Analysis of this protein sequence reveals the following: 

i cleavable W-term signal seg. 
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Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9797> which encodes amino acid sequence <SEQ ID 9798> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44435 GB:U65000 type-I signal peptidase SpsB [Staphylococcus 
aureus] 

Identities = 62/185 (33%) , Positives = 97/185 (51%) , Gaps = 12/185 (6%) 

' Query: 10 VKRDFIRNIILALIAVLILILLRYFV?ATFKVKKDATNSY?SNGDVVVVN RNRTPK 65 

+K++ + II +A +IL 4+ F+ + 4 ++ + +G+ V VN 4 + 
Sbjct: 1 MKICELLEWIISIAVAFVILFIVGKFIVTPYTIKGESMDPTLKDGERVAVNIIGYKTGGLE 60 

Query: 66 YKDFIVYKVGKIF-YISRVIGEPNQKTOVMDDILYLNDVFKDEPYIEKMICNAYSEKKDGQ 124 

4 +V+ K Y+ RVIG P KV +D LY+N +DEPY+ N + K G 
Sbjct: 61 KGNWVFHANKNDDYVKRVIGVPGDKVEYKNDTLYVNGKKQDEPYL NYNLKHKQGD 116 

Query: 125 MPFTSDFSVETL- -TRNKESRVPKGSYLVtNDNRQHKNDSRKFGLIKEKDIRGVITFKVY 182 

T F V+ L K + +PKS YLVL DNR+ DSR FGLI E I G ++F+ + 

Sbjct: 117 Y-ITGTFQVKDLPNANPICSNVIPKGKYLVLGDNREVSKDSRAFGLIDEDQIVGKVSFRFW 175 

Query: 183 PLSEF 187 
P SEF 

Sbjct: 176 PFSEF 180 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4083> which encodes the amino acid 
sequence <SEQ ID 4084>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-14.22 Transmembrane 10 - 26 ( 4 - 34) 

Final Results 

bacterial membrane Certainty=0. 6689 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 99/185 (53%) , Positives = 130/185 (69%) 

Query: 9 IWKRDFIRNIILALIAVLILILLRYFVFATFKVHKDATNSYFSNGDVVVvNRNRTPKYKD 5B 

MVKRDFIRNI+L LI +4 ILLR FVF4TFKV + N+Y 4GD4V 4 4N PKYKD 
Sbjct: 1 MVKrfflFIRNILLLLIVIIGAILLRIFVFSTFKVSPETANTYLKSGDLVTIKKNIQPKYKD 60 

Query: 69 FIVYKVGKIFYISRVIGEPNQKVRVMDDILYLNDVFKDEPYIEKMKNAYSEKKDGQMPFT 128 

F4VY4VGK Y4SRVI V MDDI YLN+4 4 4 Y4EKMK Y 4T 

Sbjct: 61 FVVYRVGKKDYVSRVIAVEGDSVTY^DIFYLNNMVESQAYLEKMI^AHYIjNElAPFGTL^ 120 

Query: 129 SDFSVETLTRNKESRVPKGSYLVLNDNRQNKNDSRKFGLIKEKDIRGVITFfCVYPLSEFG 188 

DF4V T4T 4K 4VPKG YL4LNDNR4N NDSR4FGLI I4G44TF4V PLS4FG 
Sbjct: 121 DDFTVATITADKYQKVPKGKYLLIjNDNRKNTNDSRRFGLINASQIKGLVTFRVLPLSDFG 180 

Query: 189 FTASE 193 
F E 

Sbjct: 181 FVEVE 185 

A related GBS gene <SEQ ID 8789> and protein <SEQ ID 8790> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 10.13 
GvH: Signal Score (-7.5): 0.45 
Possible site: 37 
5 »> Seems to have a cleavable N-term signal seq. 

ALOM program count: 0 value: 3.82 threshold: 0.0 
PERIPHERAL Likelihood =3.82 69 
modified ALOM score: -1.26 

10 *** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

36.0/59.9% over 165aa 

Bacillus caldolyticus 
20 EGAD|24914| signal peptidase i Insert characterized 

ORF00169(364 - 867 of 1179) 

EGAD| 24914 | 25718 (15 - 180 of 182) signal peptidase i {Bacillus caldolyticus} 
%Match =11.9 
25 %Identity =35.9 %Similarity =59.9 

Matches = 60 Mismatches = 61 Conservative Sub.s = 40 

312 342 372 402 432 462 483 510 

L*KHDIMEKRLGVWVKRDFIKWIILALIAVLILILLRYFVFATFKVHKmTNSYFSN(3DVVvA7NR- - -NRTPKYK-DFI 
30 , 1 :: ::|| :: || ||h i I = >1::"||> =1=11 

VTKQKEKRGRRWPWFVAVCWATLRLFVFSNYVVEGKSMMPTLESGNLLIVNKLSYDIGPIRRFDII 
10 20 30 40 50 60 

537 567 597 627 657 687 717 747 

35 wro7GKIF-YISRVIGEPNQKVRVMDDILYIiNDVFKDEPYIEKMKNAYSEKKDGQMPFTSDFSVETLTRNKESRVPKGSY 

, i= i i = mi i =1111 = 1 mi= i •• n== i ii==i =i ==m.i 

VFHANKKEDYVKRVIGLPGDRIAYKNDILYVNGKKVDEPYLRPYKQ KLLDGRL--TGDFTLEEVT--GKTRVPPGCI 

80 90 100 110 120 130 140 

40 777 807 837 867 897 927 957 987 

LVXJTONRQNKNDSRKFGLIKEKDIRGVITFKVYPLSEFGFTASE**™^ 
=11 111 = III lls:| | | : |: :|: :| | 
FVLGDNRLSSWDSRHFGFVKINQIVGJOTDFRYWPFKQFAFQF 
150 160 170 180 

45 

SEQ ID 8790 (GBS7) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 (lane 4; MW 46kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 2 (lane 4; MW 21kDa). The GBS7-His fusion 
product was purified (Figure 189, lane 6) and used to immunise mice. The resulting antiserum was used for 
50 FACS (Figure 262), which confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1332 

A DNA sequence (GBSxl416) was identified in S.agalactiae <SEQ ID 4085> which encodes the amino 
55 acid sequence <SEQ ID 4086>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 1099 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9795> which encodes amino acid sequence <SEQ ID 9796> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



MNICRVKIVATLGPAVEFRGGKKFGESGYWGESLDVEASAEKIAQLIKEGANVFRFNFSHG 6 0 
MNKRVKIVATLGPAVE RGGKKFGE GYW E LD +ASA+ IAQLI+EGANVFRFNFSHG 
MNKRVKIVATLGPAVEIRGGKKFGEDGYWSEKLDPDASAKNIAQLIEEGANVFRFNFSHG 6 0 

















61 


Query- 


121 


Sbjct: 


121 






Sbjct: 






241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


'361 






Sbjct: 


421 




481 


Sbjct: 


401 



* AE IAGQKVGFLLDTKGPEIRTELFE A ++Y TG ++R+ATKQ 



G+KST +VIALNVAG LDIFDDVEVGKQ+LVBDGKLGL V KD + REF V VENDG+I 



GIIVKL AKIENQQGIDNIDEI IEAADGIMIARGDMG1EVPFSMVPVYQKMIITKVNAAGK 



V+TATNMLETMT+KPRATRSEVSDVFNAVIDGTDATMLSGESANG YPVESVRTMATI 



KNAQTLL EYGRL+SS F R++ T+V+ASAVKDAT+SM I+L+V +TE+GNTA I +R 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4087> which encodes the amino acid 
sequence <SEQ ID 4088>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0915 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

RGD motif: 272-274 



60 The protein has homology with the following sequences in the databases: 



>GP:AAF25804 GB:AF172173 pyruvate kinase [Streptococcus thermophilus] 
Identities = 404/500 (80%) , Positives = 457/500 (90%) 
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Query: 1 MWKRVKITOTLGPAVEIRGGKK^GEDGTOAGQLDVEESAKKIAELIEAGMIVFRENFSHG 60 

MNKRVKIVATLGPAVEIRGGKK+GEDGYW+ +LD + SAK IA+LIE GANVFRFNFSHG 
Sbjct: 1 MNKRVKIVATLGPAVEIRGGKKFGEDGWSEKLDPDASAIQJIAQLIEEGANVFRFNFSHG 60 

Query: 61 DHKEQGDRMATVRLAEEIARQKVGFLLDTKGPEMRTELFADDAK3FSYVTGEKIRVATTQ 120 

+H EQG+RM VR+AE IA QKVGFLLDTKGPE+RTELF DAKE++Y TGE+IR+AT Q 
Sbjct: 61 HHAEQGERMDWRMAESIAGQKVGFLLDTKGPEIRTELFEGDAKEYAYKTGEQIRIATKQ 120 

Query: 121 GIQSTRDVIALNVAGSLDIYDEVEVGHTILIDDGKLGLKVIDKDIATRQFIVEVENDGII 180 

G++STRDVIALNVAG+LDI+D+VEVG +L+DDGKLGL+V+DKD R+FIVEVENDGII 
Sbjct: 121 GLKSTRDVIALNVAGALDIFDDVEVGKQVLVDDGKLGLRVVDKDAEKREFIVEVENDGII 180 

Query: 181 AKQKGVNI PNTKI PFPALAERDNAD I RFGLEQSLNF I AI SFVRTAKDVEEVRE I CRETGW 240 

AKQKGVNIP TKIPFPALAERDNADIRFGLEQG+NFIAISFVRTAKDV+EVR IC ETGN 
Sbjct: 181 AKQKGVNI PYTKIPFPALAERDNADIRFGLEQGINFIAISFVRTAKDVQEVRAICEETGN 240 

Query: 241 DHVQLFAKIENQQGID1^DEIIEAADGIMIARGDMGIEVPFE^IVPVFQKMIITKVNAAGK 3 00 

HV+L AKIENQQGIDN+DEIIEAADGIMIAR3DMGIEVPFEIWPV+QKMIITKVNAAGK 
Sbjct: 241 GHVKLLAKIENQQGIDNIDEIIEAADGIMIARGDKGIEVPFEMVPVYQKMIITKVNAAGK 300 

Query: 301 AVITATNMLETMTEKPRATRSEVSDVFNAVIDGTDATMLSGESANGKYPVESWTMATID 360 

V+TATNMLETMTEKPRATRSEVSDVFNAVIDGTDATMLSGESANG YPVESVRTMATI 
Sbjct: 3 01 IVVTATISMLETMTEKPRATRSEVSDVFTSAVIDGTDATMLSGESANGPYPVESVRTMATIH 360 

Query: 361 RNAQTLLMEYGRLDSSAFPRTNIOTDVIASAVICDATESMDIKLVVTITETGNTARAISKFR 420 

+NAQTLL EYGRL+SS F R++ T+V+ASAVKDAT+SM I+L+V +TE+GNTA I +R 
Sbjct: 361 KNAQTLLKEYGRtNSSTFDRSSNTEWASAVKDATNSMHIQLIVALTESGNTASLIDTYR 420 

Query: 421 PDADII^OTFDEKVQRALMINWGVIPVLAEKPASTDDMFEVAERVAVEAGLVQSGDNIVI 480 

P+ADI A+TFDE Q++LM+NWGVIPV+ E P+STDDMFEVAERVA+E+GLV+SGDNIVI 
Sbjct: 421 PEADIWAITFDELTQKSLMLNWGVIPVVTETPSSTDDMFEVAERVALESGLVESGDNIVI 480 

Query: 481 VAGVPVGTGGTNTMRVRTVK 500 

VAGVPVG+G TNTMR+RTVK 
Sbjct: 481 VAGVPVGSGNTNTMRIRTVK 500 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 440/500 (88%), Positives = 462/500 (92%) 

Query: 1 MNKRVKIVATLGPAVEFRGGKKFGESGYWGESLDVEASAEKIAQLIKEGANVFRFNFSHG 60 

MNKRVKIVATLGPAVE RGGKK+GE GYW LDVE SA+KIA+LI+ GANVFRFNFSHG 
Sbjct: 1 MNKRVKIVATLGPAVE IRGGKKYGEDGYWAGQLDVEESAKKIAELIEAGANVFRFNFSHG 60 

Query: 61 DHAEQGAFJ^TTOKAEEIAGQKVGFLLDTKGPEIRTELFEDGADFHSYTTGTKLRVATKQ 120 

DH EQG RMATVR AEEIA QKVGFLLDTKGPE+RTELF DA SY TG K+RVAT Q 
Sbjct: 61 DHKEQGDRMATVRLAEEIARQKVGFLLDTKGPEMRTELFADDAKEFSYVTGEKIRVATTQ 120 

Query: 121 GI KSTPEVIALNVAGGLD I FDD VE VGKQ I LVDDGKLGLTVFAKDKDTREFE WVENDGL I 180 

GI+ST +VIALNVAG LDI+D+VEVG IL+DEGKLGL V KD TR+F V VENDG+I 
Sbjct: 121 GIQSTRDVIAIjNVAGSLDIYDEVEVGHTILIDDC-KLGLKVIDKDIATRQFIVEVENDGII 180 

Query: 181 GKQKGVNIPYTKIPFPALAERDNADIRFGLEQGLNFIAISFVRTAKDVNEVRAICEETGN 240 

KQKGVNIP TKIPFPALAERDNADIRFGLEQGLNFIAISFVRTAKDV EVR IC ETGN 
Sbjct: 181 AKQKGVNIPNTKIPFPALAERDNADIRFGLECGLNFIAISFVRTAKDVEEVREICRETGN 240 

Query: 241 GHVKLFAKIENQQGIDNIDEIIFAADGIMIARGDMGIEVPFSMVPVYQKMIITKOTSAAGK 300 

HV+LFAKIENQQGIDN+DEIIEAADGIMIARGDMGIEVPFEMVPV+QKMIITKVNAAGK 
Sbjct: 241 DHVQLFAKIENQQGIDNLDEIIFAADGIMIARGDMGIEVPFEMVPVFQKMIITKVNAAGK 300 

Query: 301 AVITATNMLETMTDKPRATRSEVSDVFNAVTDGTDATMLSGESANGKYPVESVRTMATID 360 

AVITATNMLETMT+KPRATRSEVSDVFNAVIDGTDATMLSGESANGKYPVESVRTMATID 
Sbjct: 301 AVITATNMLETMTEKPRATRSEVSDVFNAVIDGTDATMLSGESANGKYPVESVRTMATID 360 

Query: 361 KNAQTLIMYGRLDSSAFPFJOTSIKTDVIASAVKDATHSMDIKLVVTITETGNTARAISKFR 420 

+NAQTLLNEYGRLDSSAFPR MKTDVIASAVKDATHSMDIKLWTITETGNTARaiSKFR 
Sbjct: 361 R1^QTLLNEYGRLDSSAFPRTNKTDVIASAVI<XJATHSMDIKLVVTITETGNTARAISKFR 420 
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Query: 421 PDM)IIAVTFDEKVQRSLMINiraVI?VLaDKPASTDDMFEVAERVALEAGFVESGDNIVI 480 

PDADILAVTFDEKVQR+LMINWGVI PVLA+KPASTDDMFEVAERVA+EAG V+SGDNIVI 
Sbjct: 421 PDMILAVTFDEKVQRALMIHWGVIPVIAEKPASTDDMFEViffiRVAVEAGLVQSGDNIVI 480 

Query: 481 VAGVPVGTGGTNTMRVRTVK 500 

VAGVPVGTGGTNTMRVRTVK 
Sbjct: 481 VAGVPVGTGGTNTMRVRTVK 500 

A related GBS gene <SEQ ID 8791> and protein <SEQ ID 8792> were also identified. Analysis of this 

protein sequence reveals the following: 

Belongs to Glycolysis/gluconeogenesis pathway. Proteins belonging to this methabolic 
pathway have been experimentally detected on the surface of Streptococci. 

The protein has homology with the following sequences in the databases: 

>GP] 6708108 |gb|AAF25804.l|AF172173_2|AF172173 pyruvate kinase 
{Streptococcus thermophllus} 



Query: 1 MNKRVKIVATLGPAVEFRGGKKFGESGYWGESLDVEASAEKIAQLIKEGANVFRFNFSHG 60 

MNKRVKIVATLGPAVE RGGKKFGE GYW E LD +ASA+ IAQLI +EGANVFRFNFSHG 
Sbjct: 1 MNKRVKIVATLGPAVEIRGGKKFGEDGYWSEKLDPDASAKNIAQLIEEGANVFRFKFSHG 60 

Query: 61 DHAEQGARMATVRKAEEIAGQKVGFLLDTKGPEIRTELFEDGADFHSYTTGTKLRVATKQ 120 

+HAEQG RM VR AE IAGQKVGFLLDTKGPEIRTELFE A ++Y TG ++R+ATKQ 
Sbjct: 61 NHAEQGERMDVVRMAESIAGQKVGFLLDTKGPEIRTEIjFEGDAKEYAYKTGEQIRIATKQ 120 

Query: 121 GI KSTPEVIALNVAGGLDI FDDVEVGKQI IjVDDGKXiGLWFAKDICDTREFEVVVEWDGLI 180 

G+KST +VIALNVAG LDIFDDVEVGKQ+LVDDGKLGL V KD + REF V VENDG+I 
"Sbjct: 121 GLKSTRDVIALNVAGALDI FDDVEVGKQVLVDDGKLGLRVVDKDAEKREFIVEVENDGI I 180 

Query: 181 GKQKGVNI PYTKI PFPALAERDNADIRFGLEOGLNFIAI SFVRTAKDVNEVRAI CEETGX 240 

KQKGVNIPYTKIPFPALAERDNADIRFGLEQG+NFIAISFVRTAKDV EVRAICEETG 
Sbjct: 181 AKQKGWIPYTKIPFPALAERDNAD.IRFGLEQGINFIAISFVRTAKDVQEVRAICEETGN 240 

Query: 241 GHVKLFAKIENQQGIDNIDEIIEAADGIMIARGDMGIEVPFEWPVYQKMIITKVNAAGK 300 

GHVKL AKIENQQGIDNIDEIIEAADGIMIARGDMGIEVPFEMVPVYQKMIITKVNAAGK 
Sbjct: 241 GHVKLIiAKIENQQGIDNIDEIIEAADGIMIARGDMGIEVPFEMVPVYQKMIITKVNAAGK 300 

Query: 301 AVITATNMLETMTDKPRATRSEVSDVFNAVIDGTDATMLSGESANGKYPVESVRTMATID 360 

V+TATNMLETMT+KPRATRSEVSDVFNAVIDGTDATMLSGESANG YPVESVRTMATI 
Sbjct: 301 IWTATNMLETMTEKPRATRSEVSDVFNAVIDGTDATMLSGESANGPYPVESVRTMATIH 360 

Query: 361 KNAQTLLNEYGRLDSSAFPRNNKTDVIASAVKDATHSMDIKLVVTITETGNTARAISKFR 420 

KNAQTLL EYGRL+SS F R++ T+V+ASAVKDAT+SM I+L+V +TE+GHTA I +R 
Sbjct: 361 KNAQTDLKEYGRLNSSTFDRSSNTEWASAVKDATNSMHIQLIVALTESGNTASLIDTYR 420 

Query: 421 PDADILAVTFDEKVQRSLMINWGVIPVLADKPASTDDMFEVAERVALEAGFVESGDNIVI 480 

P+ADI A+TFDE Q+SLM+NWGVIPV+ + P+STDDMFEVAERVALE+G VESGDNIVI 
Sbjct: 421 PEADIWAITFDELTQKSLMLNWGVIPWTETPSSTDDMFEVAERVALESGLVESGDNIVI 480 

Query: 481 VAGVPVGTGGTNTMRVRTVK 500 

VAGVPVG+G TNTMR+RTVK 
Sbjct: 481 VAGVPVGSGNTNTMRIRTVK 500 

SEQ ID 8792 (GBS330) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 73 (lane 5; MW 59kDa). 

GBS330-His was purified as shown in Figure 213, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
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Example 1333 

A DNA sequence (GBSxl417) was identified in S.agalactiae <SEQ ID 4089> which encodes tlie amino 
acid sequence <SEQ ID 4090>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0632 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MKRIAVLTSGGDAPGMNAAIRAWRKAI SEGMEVYGINQGYYGMVTGDI FPLDANSVGDT 60 

MKRIAVLTSGGDAPGMNAA+RAW KAISEG+EV+GIN+GY GMV GDIF LDA V + 
Sbjct: 1 MKRIAVLTSGGDAPG^ARVI^VVLKAISEGIEVFGINRGYAGMVEGDIFKLDAKRVENI 60 

Query: 61 INRGGTFLRSARYPEFAELEGQLKGIEQLKKHGIEGVWIGGDGSYHGAMRLTEHGFPAV 120 

Sbjct: 61 LSRGGTFLQSARYPEFAQLEGQLKGIEQLKKYGIEGVWIGGDGSYHGAMRLTEHGFPAV 120 

Query: 121 GIjPGTIDNDIVGTDYTIGFDTAVATAVENLDRLRDTSASHNRTFVVEVMGRNAGDIALWS 1B0 

GLPGTIDNDIVGTDYTIGFDTAVATA E LD+++DT+ SH RTFVVEVMGRNAGDIALW+ 
Sbjct: 121 GLPGTIDNDIVGTDYTIGFDTAVATATEALDKIQDTAFSHGRTFVVEVMGRNAGDIALWA 180 



Query: 181 GIARGADQIIVPEEEFNIDEWSNVRAGYAAG-KHHQIIVLAEGVMSGDEFAKTMKAAGD 239 
GIA+GADQI I VPEEE++ 1 +EW V+ GY +G K H I IVLAEGVM +EFA MK AGD 
30 Sbjct: 181 GIASGADQIIVPEEEYDINEVWKVKEGYESGEKSHHIIVIAEGvMGAEEFAAKMKEAGD 240 



Query: 240 DSD^RVTN^HLLRGGSPTARDRVLASRMGAYAVQLLKEGRGGLAVGVHNEE^WESPILG 299 

SDLR TNLGH+ + RGGS PTARDRVLAS MGA+AV LLKEG GG+AVG+HNE++VESPILG 
Sbjct: 241 TSDLPATNLGHVIRGGSPTARDRVLASWMGAHAVDLLJCEGIGGVAVGIHNEQLVESPILG 300 

Query: 300 IAEEGALFSLTDEGKIVVNNPHKADLRIAALNRDLAN 336 

AEEGALFSLT++GKI+VNNPHKA L A LNR LAN 
Sbjct: 3 01 TAEEGALFSLTEDGKIIVNNPHKARLDFAELNRSLAN 337 



40 Proteins in the glycolysis/gluconeogenesis pathway have been experimentally detected on the surface of 
Streptococci. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 409 1> which encodes the amino acid 

sequence <SEQ ID 4092>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
45 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0632 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

50 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 274/336 (81%) , Positives = 306/336 (90%) , Gaps = 1/336 (0%) 

55 Query: 1 MKRIAVLTSGGDAPGI^AAIRAVVRKAISEGMEVYGINQjGYYGMVTGDIFPLDANSVGDT 60 

MKRIAVLTSGGDAPGMNAAIRAVVRKAISEGMEVYGIN+GY GMV GDIFPL + VGD 
Sbjct: 1 MKRIAVLTSGGDAPG^OTAA.IRAvVRKAISEGMEvYGINRGYAG^lVDGDIFPLGSKEVGDK 60 

Query: 61 INRGGTFLRSARYPEFAELEGQLKGIEQLKKHGIEGVWIGGDGSYHGAMRLTEHGFPAV 120 
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I+RGGTFL SARYPEFA+LEGQL GIEQLKKHGIEGWVIGK3DGSYHGAMRLTEHGFPAV 
Sbjct: SI ISRGGTFLYSARYPEPAQLEGQLAGIEQLKKHGIEGVWIGGDGSyHGAMRLTEHGFPAV 120 



5 



Query: 121 GDPGTIDNDIVGTDYTIGFDTAVATAVENLDRLRDTSASHNRTFWEVMGRNAGDIALWS 180 

G+PGTTDNDI GTDYTIGFDTAV TAVE +D+LRDTS+SH RTFWEVMGRNAGDIALW+ 
Sbjct: 121 GIPGTIDND1AGTDYTIGFDTAVNTAVEAIDKLRDTSSSHGRTFWEVMGRNAGDIALWA 180 



10 



Query: 181 GIAAGADQI IVPEEEFNIDEWSNVRAGYA-AGKHHQI I VIAEGVMSGDEFAKTMKAAGD 239 

GIA+GADQIIVPEEEF+I + 4-V S ++ + GK+H 1 I VLAEGVMSG + FA+ +K AGD 
Sbjct: 181 GIASGADQIIVPEEEFDIEKVASTIQYDFEHKGKNHHIIVLAEGVMSGEAFAQKLKEAGD 240 



Query: 240 DSDLRVTl^GHLLRGGSPTAPJDRVLASRMGAYAVQLLKEGRGGLAVGVHNEEMVESPILG 299 

SDLRVTNLGH+LRGGSPTARDRV+AS MG++AV+LLK+G+GGLAVG+HNEE+VESPILG 
Sbjct: 241 KSDLRVTNLGHILRGGSPTARDRV1ASI-JMGSHAVELLKDGKGGLAVGIHHEELVESPILG 300 



15 



Query: 300 LAEEGALFSLTDEGKIVVNNPHKADLRtAALNRDLA 335 

AEEGALFSLT+EGKI +VNNPHKA L AALNR L+ 
Sbjct: 3 01 TAEEGALFSLTEEGKI IVNNPHKARLDFAALlslRSLS 336 



20 SEQ ID 4090 (GBS313) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 45 (lane 5; MW 41kDa). 

GBS313-His was purified as shown in Figure 204, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 1334 

A DNA sequence (GBSxl418) was identified in S.agalactiae <SEQ ID 4093> which encodes the amino 
acid sequence <SEQ ID 4094>. This protein is predicted to be DNA polymerase III alpha subunit (dnaE). 
Analysis of this protein sequence reveals the following: 

Possible site: 55 
30 >>> Seems to have no N- terminal signal sequence 



There is also homology to SEQ ID 4096. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 1335 

A DNA sequence (GBSxl419) was identified in S.agalactiae <SEQ ID 4097> which encodes the amino 
acid sequence <SEQ ID 4098>. This protein is predicted to be YHCF (farR). Analysis of this protein 
sequence reveals the following: 

Possible site: 52 
45 >» Seems to have no N-terminal signal sequence 



Final Results 



35 



bacterial cytoplasm Certainty=0. 1446 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Wot Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



50 



bacterial cytoplasm Certainty=0. 3316 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 
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>GP:BAB04102 GB:AP001508 transcriptional regulator (GntR family) 
[Bacillus halodurans] 
Identities = 51/116 (43%) , Positives = 79/116 (67%) 

Query: 5 FlffiKSPIYSQIAEHIKMQIVSQEIKSGDQLPTVRELAQEAGVNPNTMQRAFTELEREGMV 64 

F+ PIY Q+AE +K QIV E++ G++LP+VR++ EA VNPNT+QR + ELE +V 
Sbjct: 5 FHSSEPIYLQLAERVKRQIVRGELRLGEKLPSVRDMGIEANVNPNTVQRTYRELEGLKIV 64 

Query: 65 FSQRTSGRFOTEDNLLIGKIRQQVAKAEIATEVNNMKKIGYICLDEITVALDHFIKE 120 

S+R G FVTED ++ IR+Q+ + E++ FV M+++GY +EI L+ ++ E 
Sbjct: 65 ESKRGQGTFVTEDEQVLQMREQMKETEISHFVQGMREMGYSDNEIQAGLESYLTE 120 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4099> which encodes the amino acid 
sequence <SEQ ID 4100>. Analysis of this protein sequence reveals the following: 

j N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2075 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 80/120 (66%), Positives = 100/120 (82%) 

Query: 1 MAWEFNEKSPIYSQIAEHIKMQIVSQEIKSGDQLPTVRELAQEAGVNPNTMQRAFTELER 60 

M+W+F EKSPIY+QIA+H+ MQI+SQEIKSGDQLPTVRE A+ AGVNPNTMQRAFTELER 
Sbjct: 1 MSWKFEEKSPIYAQIAQHVMMQI1SQEIK3GDQLPTVREYAEIAGVNPNTMQRAFTELER 60 

Query: 61 EGMVFSQRTSGRFvTEDNLLIGKIRQQVAKAEIA.TFvNMMKKIGYKLDEITVALDHFIKE 120 

EGMV+SQRT4GRFVT+D LI + R+++A +EL +F+ NM K+G+ EI L F+KE 
Sbjct: 61 EGNWYSQRTAGRFvTDDQKLIARKRREIAISELESFITNMTKMGFSHTEIIPVLTSFLKE 120 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1336 

A DNA sequence (GBSxl420) was identified in S.agalactiae <SEQ ID 4101> which encodes the amino 
acid sequence <SEQ ID 4102>. This protein is predicted to be ABC transporter, ATP-binding protein 
(yhcG). Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2757 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CfiB12735 GB:Z99108 similar to glycine betaine/L-proline 
transport [Bacillus subtilis] 
Identities = 87/228 (38%) , Positives = 150/228 (65%) , Gaps = 1/228 (0%) 

Query: 5 LQLHHVTKKYHKHTAVNDVTVSIPTGKIIGLLGPNGSGKTTIIKMINGLLQPDKGDIVID 64 

++L HV+KKY +HTAVNDV++++ +G+I GL+GPNGSGK+T +KM+ GLL P G + +D 
Sbjct: 3 IKIiEHVSKKYGRHTAWDVSITLSSGRIYGLIGPNGSGKSTTLKMMAGIiLFPTSGFVKVTJ 62 
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Sbjct: 63 EEQVTREMVRQTAYLTELDMFYPH?TVKDNM\ T FYQSQFPD?HTEQWK1LNEMQLKPEKK 122 

Query: 125 LKNLSKGNKEKVQLILVMSRKARLYILDEPIGGVDPAARDYILKT1ISNYSNDAS-VLIS 183 

+K LSKGN+ +++++L ++R+A + +LDEP G+DP RD 1+ +++S + V+I + 
Sbjct: 123 IKKLSKGITOGRLKIVlALARRADVILIiDEPFSGLDPMVRDSIVNSLVSYIDFEQQIWIA 182 

Query: 184 THLISDIEPILDEVIFLKEGEIDLQGNADDLREEHNCSIDALFRERFK 231 

TH I +IE +LDEVI L GE Q +D+RE+ S+ F+ + + 
Sbjct: 183 THEIDEIETLLDEVIILANGEKVAQREVEDIREQEGMSV1QWFKSKME 230 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4103> which encodes the amino acid 
sequence <SEQ ID 4104>. Analysis of this protein sequence reveals the following: 



N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 1983 (Affirmative) . 

bacterial membrane Certainty=0 . 0000 (Not Clear) < i 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 171/231 (74%) , Positives = 200/231 (86%) 

Query: 1 MTQLLQLHHVTKKYHKHTAVNDVTVSIPTGKIIGLLGPNGSGKTTIIKMINGLLQPDKGD 60 

M LLQLHHV+K Y + A++D-T++IP GKIIGLLGPNGSGKTT+IK+INGLLQP+KG+ 
Sbjct: 1 MAHLLQLHHVSKSYREKKAIDDLTITIPNGKI1GLLGPNGSGKTTLIKLINGLLQPNKGE 60 

Query: 61 IVIDGYRPSVETKKIISYLPDTSYLQENMKIKDWTLFEDFYNDFDSKVAYQLFEDLNLN 120 

IVIDGYRP VETKKI ISYLPDT+YL ENM+IKD++ F DFY+DFD A L DL L+ 
Sbjct: 61 IVIDSYRPCVETKKIISYLPDTTYLNENMRIKDMLEFFSDFYSDFDKSKATSLLRDLELD 120 

Query: 121 PRERLKNLSKGNKEKVQLILVMSRKARLYILDEPIGGVDPAARDYILKTIISNYSNDASV 180 

P +R K LSKGNKEKVQLILVMSRKARLY+LDEPIGGVDPAARDYILKTI1++Y +ASV 
Sbjct: 121 PEDRFKTLSKGNKEKVQLILVMSRKARLYVLDEPIGGVDPAARDYILKTIINSYCENASV 180 

Query: 181 L I STHLI SD I EP I LDEVI FLKEGE I DLQGNADDLREEHNCS IDALFRERFK 231 

+ISTHLISDIEPILDEVIFLK+G + Hi GNADDLR+E+ SID+LFRE +K 
Sbjct: 181 IISTHLISDIEPILDEVIFLKQGRLFLSGNADDLRQEYQQSIDSLFRETYK 231 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1337 

A DNA sequence (GBSxl421) was identified in S.agalactiae <SEQ ID 4105> which encodes the amino 
acid sequence <SEQ ID 4106>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 



a uncleavable N-term signal seg 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood 
Likelihood = -9 
Likelihood = -9 
Likelihood = -6 
Likelihood = -4 
Likelihood = -4 
Likelihood = -1 



Transmembrane 
Transmembrane 
Transmembrane 22 - 38 
Transmembrane 192 - 208 



- Final Results 

bacterial membrane 
bacterial outside 



187 - 218: 
228 - 253 
155 - 175 
103 - 119: 



— Certainty=0. 7156 (Affirmative) • 
-- Certainty=0. 0000 (Not Clear) < i 



bacterial cytoplasm Certainty=0. 0000 (Not Clear) ■ 



60 The protein has no significant homology with any sequences in the GENPEPT database. 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4107> which encodes the amino acid 
sequence <SEQ ID 4108>. Analysis of this protein sequence reveals the following: 



Possible site: 28 
■ > Seems to have a cleavable N-term signal seq. 
INTEGRAL Likelihood =• 
INTEGRAL Likelihood =■ 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = -0. 



Transmembrane 190 - 206 ( 187 - 215] 

Transmembrane 121 - 137 ( 104 - 

Transmembrane S3- 79 ( 59 - 

Transrr.embrane 158 - 174 ( 156 - 

Transmembrane 232 - 248 ( 232 - 

- 120 { 104 - 120! 



le Certainty=0 . 5607 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 116/267 (43%) , Positives = 165/267 (61%) , Gaps = 13/267 (4%) 

Query: 1 MFGKLLKYELKSVGKWYLTLNAAVLLVSIILGLVLKALG GNFSTDTNSTSAQIFT 55 

MFGKLLKYE +S+GKWY LNA V+ ++ IL +K G F TN ++ 
Sbjct: 1 MFGKLLKVEFRSIGKWYFALNAFVIAIAAILSFTIKLFAQSNSDGLFGVLTN KMLP 56 

Query: 56 IILvLLLAMVISGSLLSTLAIIIKRFYSNIFGRQGYLTLTLPVTTNQIICSKLLASLLWS 115 

+ L L +I+GSLLSTL IIIKRF ++FG +GYLTLTLPV ++QII SKLLAS + S 
Sbjct: 57 LTLGLTFGSLIAGSLLSTLLIIIKRFSKSVFGWEGYLTLTLPVNSHQIILSKLLASFICS 116 

Query: 116 IFNIFIVIIGIILVILPLVGIGQFVVAFPEIYKIISSSNAPLFIAYFFLSYVAGTLLIYL 175 

+FN 1+ I +VI+P+ I + + F +K+ N +AY LS LLIYL 
Sbjct: 117 VFNTIILAFAIAIVIVPMFNINELLEGFFTJSFKTOYFINMLTvLAYVLLSTFTSILLIYL 176 

Query: 176 SIAVGQLFTNKRVLMGIVSYFGISLLITFLTLIIDSIFHIDLFNSHANA-TFSQPVLLY- 233 

SI++GQLF+N+R LM ++YF + +LI+ + S HI N+ A++ F++ +Y 

Sbjct: 177 SISIGQLFSNRRGLMAFIAYFILVILISVAATYVHS- -HIFNINTSADSFPFTEQKTIYL 234 

Query: 234 NI LVS IVE IAI FYMLTHS 1 1 KYKLNI Q 260 

IL +E+ +FY+ T+ UK KLN+Q 
Sbjct: 235 LILEQFIEMIMFYLATNFIIKNKLNLQ 261 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1338 

A DNA sequence (GBSxl422) was identified in S.agalactiae <SEQ ID 4109> which encodes the a 
acid sequence <SEQ ID 41 10>. Analysis of this protein sequence reveals the following: 



N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5890 (Affirmative) . 

bacterial membrane Certainty=0. 0000 (Not Clear) < i 

bacterial outside Certainty=0.0000 (Not Clear) < t 

The protein is similar to ORF24 from S.faecalis. 

No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1339 

A DNA sequence (GBSxl423) was identified in S.agalactiae <SEQ ID 4111> which encodes the amino 
5 acid sequence <SEQ ID 41 1 2>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm — Certainty=0. 3 3 16 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

The protein is similar to ORF23 from S.faecalis. No corresponding DNA sequence was identified in 
15 S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1340 

A DNA sequence (GBSxl424) was identified in S.agalactiae <SEQ ID 4113> which encodes the amino 
20 acid sequence <SEQ ID 41 14>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 4256 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to ORF22 from S.faecalis. No corresponding DNA sequence was identified in 
30 S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1341 

A DNA sequence (GBSxl425) was identified in S.agalactiae <SEQ ID 4115> which encodes the amino 
35 acid sequence <SEQ ID 41 1 6>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.37 Transmembrane 62 - 78 ( 55 - 84) 
INTEGRAL Likelihood = -8.44 Transmembrane 19 - 35 ( 14 - 41) 

40 

Final Results 

bacterial membrane --- Certainty=0. 6349 (Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

45 

The protein is similar to ORF21 from S.faecalis. 



WO 02/34771 



-1478- 



PCT/GB01/04789 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4117> which 
sequence <SEQ ID 4118>. Analysis of this protein sequence reveals the following: 

d N-terrrtinal signal 



Final Results 

bacterial cytoplasm Certainty=0. 2444 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 54/236 (22%) , Positives = 95/236 (39%) , Gaps = 12/236 (5%) 

Query: 204 KI1GKLRLMKNVWWEYDKLPHMLIAGGTGGGKTYFILTLIEALLHTDSKLYILDPKN 259 

+ GK+ ++K+ DK H IAG +G GK Y LT ++L S L I+DPK 

Sbjct: 14 QQGKIPVIKHFELNLDKGSHWAIAGNSGSGKPY-ALTYFLSVLKPKSGLIIIDPKFDTPS 72 

Query: 260 - -ADIADLGSVMANVYYRKEDLLSCIETFYEEMMKRSEEMICQMKNYKTGKNYAYLGLPAH 317 

A + + +KD+S+ + ++ + + ++L + 

Sbjct: 73 QWARENKIAVIHPVENHSKSDFVSQVNEQLNQCATLIQKRQAILYDNPNHQFTHLTI--- 129 

Query: 318 FLIFDEYVAFMEMLGTKEOTAVMNKLKQIVMLGRQAGFFLIIACQRPDAKYLGDGIRDQF 377 

+ DE +A E + A + L QI +LG L L QR D + +R+Q 

Sbjct: 130 --VIDEVXALSEGVNKNIKEAFFSLLSQIADIjGHATKIHLFLGSQRFDrorriPISVREQL 187 

Query: 3 78 NFRVALGRMSEMGYGMMFGSDVQKDFFLKRIKGRGYVDVGTSVISEFYTPLVPKGY 433 

N + +G +++ +F + + G G + V + S PL+ Y 

Sbjct: 188 NVLLQIGNINQKTTQFLFPDLDPEGIVrPTGHGTGIIQWDNEHSYQVLPLLCPTY 243 

SEQ ID 4116 (GBS109d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 121 (lane 8 & 9; MW 71kDa) and in Figure 184 (lane 2; MW 71kDa). It was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
121 (lane 11; MW 46kDa), Figure 128 (lane 4; MW 46kDa) and Figure 179 (lane 7; MW 46kDa). 
GBS109d-His was purified as shown in Figure 232 (lanes 7 & 8). GBS109d-GST was purified as shown in 
Figure 236, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1342 

A DNA sequence (GBSxl426) was identified in S.agalactiae <SEQ ID 4119> which encodes the amino 
acid sequence <SEQ ID 4120>. Analysis of this protein sequence reveals the following: 
Possible site: 22 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1343 

A DNA sequence (GBSxl427) was identified in S.agalactiae <SEQ ID 4121> which encodes the amino 
acid sequence <SEQ ID 4122>. Analysis of this protein sequence reveals the following: 

5 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4469 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9793> which encodes amino acid sequence <SEQ ID 9794> 
was also identified. 

The protein is similar to ORF20 from S.faecalis. No corresponding DNA sequence was identified in 
S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1344 

A DNA sequence (GBSxl428) was identified in S.agalactiae <SEQ ID 4123> which encodes the amino 
acid sequence <SEQ ID 4124>. Analysis of this protein sequence reveals the following: 

N-terminal. signal 



Final Results 

bacterial cytoplasm --- Certainty=0 . 1367 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1345 

A DNA sequence (GBSxl429) was identified in S.agalactiae <SEQ ID 4125> which encodes the amino 
acid sequence <SEQ ID 4126>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.77 Transmembrane 39 - 55 ( 34 - 64) 
INTEGRAL Likelihood = -6.32 Transmembrane 16 - 32 ( 10 - 35) 

Final Results 

bacterial membrane Certainty=0. 5310 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to ORF19 from S.faecalis. No corresponding DNA sequence was identified in 
S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1346 

A DNA sequence (GBSxl430) was identified in S.agalactiae <SEQ ID 4127> which encodes the amino 
5 acid sequence <SEQ ID 4128>. This protein is predicted to be antirestriction protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm --- Certainty=0. 2918 (Affirmative) < suco 

bacterial membrane --- Certainty=0.0000 (Not Clear) < suco 

bacterial outside --- Certainty=0.0000 (Not Clear) < suco 

15 The protein is similar to ORF18 from S.faecalis. No corresponding DNA sequence was identified in 
S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1347 

20 A DNA sequence (GBSxl431) was identified in S.agalactiae <SEQ ID 4129> which encodes the amino 
acid sequence <SEQ ID 4130>. Analysis of this protein sequence reveals the following: 
Possible site: 27 

»> Seems to have a cleavable H-term signal seq. 

INTEGRAL Likelihood = -3.61 Transmembrane 75 - 91 ( 72 - 94) 

25 

-t Final Results 

bacterial membrane Certainty=0 . 2444 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

30 

The protein is similar to ORF17 from S.faecalis. No corresponding DNA sequence was identified in 
S.pyogenes. 

A related GBS gene <SEQ ID 8793> and protein <SEQ ID 8794> were also identified. Analysis of this 
protein sequence reveals the following: 

35 Lipop Possible site: -1 Crend: 4 

McG: Di scrim Score: -7.12 
GvH: Signal Score (-7.5): -2.52 

Possible site: 43 
»> Seems to have no N-terminal signal sequence 
40 ALOM program count: 1 value: -3.61 threshold: 0.0 

INTEGRAL Likelihood = -3.61 Transmembrane 37 - 53 ( 34 - 56) 
PERIPHERAL Likelihood =3.66 58 
modified ALOM score: 1.22 

45 *** Reasoning Step: 3 

Pinal Results 

bacterial membrane Certainty=0. 2444 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 
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100.0/100.0% over 167aa 

Enterococcus faecalis 

EGAD | 14977 | hypothetical protein Insert characterized 
GP|532550|gb|AAB6001S.l| ]U09422 ORF17 Insert characterized 

5 

ORF00720(187 - 690 of 990) 

EGAD | 14977 | 15011(1 - 168 of 168) hypothetical protein {Enterococcus faecalis} 
GP | 532550 | gb | AAB60016. 1 | |U09422 ORF17 {Enterococcus faecalis} 
%Match =50.3 
10 %Identity = 100.0 %Similarity = 100.0 

. Matches = 168 Mismatches = 0 Conservative Sub.s = 0 

120 150 180 210 240 270 300 330 

L*AKYQLVFKTILIIKPMVGI*TFQERLSQPIMGFLKSSIKSVGTLLLADFLFYGVAQSATPIFYERIDYMKKIRSYTSI 

15 niiiiiiiiiiiiiiiiiiiiiiMiiiiiiiiiiiiiiiiiiiiii 

MGFLKSSIKSVGTjLLADFLFYGVAQSATPIFYERIDYMKKIRSYTSI 
10 20 30 40 

360 390 420 450 480 510 540 570 

20 WSVEK^YSINDFRLPFPITFTQMTWFWSLFAVWILGmPPLSMIEGAFLK^TGIPVAFTWFMSTKTFDGKKPYGFLKS 

ll]||]|||]||||||||IM!)|]|||||||||||||||]il!llllllll!l]]|||]|||]]]|lll!II!llll!l 

WSVEKVLYSINDFRLPFPITFTQNWWFWSLFAVMILGNLPPLSMIEGAFLKYFGIPVAFTWFMSTKTFDGKKPYGFLKS 
60 70 80 90 100 110 120 

25 600 630 660 690 720 750 780 810 

VIAYALRPIOLTYAGKKVTLGRNQPQEAITAVRSEFYGISN*IH*KQSRLE*RRGMLCLL*ACELQLLISKSRTENTSA*P 

llllllllllllllllllllllllllllllllllllllll 
VIAYALRPKLTYAGKKVTLGRNQPQEAITAVRSEFYGISN 
■ 140 150 160 

30 

SEQ ID 8794 (GBS223) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 7; MW 18kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 1348 

A DNA sequence (GBSxl432) was identified in S.agalactiae <SEQ ID 4131> which encodes the amino 
acid sequence <SEQ ID 4132>. Analysis of this protein sequence reveals the following: 

Possible site: 3 7 

>>> Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm --- Certainty=0 .4292 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 

A related GBS nucleic acid sequence <SEQ ID 9791 > which encodes amino acid sequence <SEQ ID 9792> 
was also identified. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 1349 

A DNA sequence (GBSxl433) was identified in S.agalactiae <SEQ ID 4133> which encodes the amino 
acid sequence <SEQ ID 4134>. Analysis of this protein sequence reveals the following: 
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10 



Possible site: 16 

>s> seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.21 Transmembrane 350 - 366 ( 345 - 368) 
INTEGRAL Likelihood = -0.32 Transmembrane 171 - 187 ( 171 - 188) 

Final Results 

bacterial membrane Certainty=0 . 3484 (Affirmative) < succ; 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



Transmembrane 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1350 

15 A DNA sequence (GBSxl434) was identified in S.agalactiae <SEQ ID 4135> which encodes the amino 
acid sequence <SEQ ID 4136>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>>> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood =-10.3 0 Transt 
20 INTEGRAL Likelihood =-10 

INTEGRAL Likelihood =-10 

INTEGRAL Likelihood = -7.43 Transmembrane 346 - 362 ( 337 - 
INTEGRAL Likelihood = -7.01 Transmembrane 186 - 202 ( 180 - 
INTEGRAL Likelihood = -5.36 Transmembrane 411 - 427 ( 404 - 
25 INTEGRAL Likelihood = -1.17 Transmembrane 386 - 402 ( 386 - 

Final Results 

bacterial membrane Certainty=0 .5118 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

35 Example 1351 

A DNA sequence (GBSxl436) was identified in S.agalactiae <SEQ ID 4137> which encodes the amino 
acid sequence <SEQ ID 4138>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm Certainty=0 . 6306 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1352 

A DNA sequence (GBSxl437) was identified in S.agalactiae <SEQ ID 4139> which encodes the amino 
acid sequence <SEQ ID 4140>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm -— Certainty=0. 2973 (Affirmative) < suco 

bacterial membrane Cer-ainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 1353 

A DNA sequence (GBSxl438) was identified in S.agalactiae <SEQ ID 4141> which encodes the amino 

acid sequence <SEQ ID 4142>. Analysis of this protein sequence reveals the following: 

Possible site: 42 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3382 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
25 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

There is also homology to SEQ ID 4144. 

A related GBS gene <SEQ ID 8795> and protein <SEQ ID 8796> were also identified. Analysis of this 
protein sequence reveals the following: 

30 Lipop: Possible site: -1 Crend: 3 

McG: Discrim Score: 11.12 
GvH: Signal Score (-7.5): 0.27 

Possible site: 24 
»> Seems to have a cleavable N-term signal seq. 
35 ALOM program count: 0 value: 4.19 threshold: 0.0 

PERIPHERAL Likelihood =4.19 69 
modified ALOM score: -1.34 

*** Reasoning Step: 3 

40 

Final Results 

bacterial outside --- Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 

The protein has homology with the following sequences in the databases: 

100.0/100.0% over 332aa 

Enterococcus faecalis 

EGAD | 36209 | hypothetical protein Insert characterized 
50 GP|532547|gb|AAB60019.l| |U09422 ORF14 Insert characterized 

ORF00727(301 - 1299 of 1599) 

EGAD|36209|37602 (1 - 333 of 333) hypothetical protein {Enterococcus 
faecalis}GP| 532547 |gb| AAB60019 . 1 1 |U09422 ORF14 {Enterococcus faecalis} 
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%Match =61.7 

%Identity = 100.0 %Similarity = 100.0 

Matches = 333 Mismatches = 0 Conservative Siib.s = 0 

5 249 279 309 339 369 399 429 459 

CSKSTTTKyKK*TTNQmHH*ESR*ETMKLKTLVIGGSGLFI^FSLLLF7AILFSDEQDSGISNIHYGGVWSAEVIiAH 

MimiiiiiiiiiimiiiMimmmiuiimiimiiiiii 

MKLKTLVIGGSGLFLMVFSLLLFVAILFSDEQDSGISNIHYGGVMVSAEVLAH 
10 20 30 40 50 

10 

489 519 549 579 609 639 669 699 

KPMVEKYAKEYGVEEYVNILLAIIQVESGGTAEDVMQSSSSLGLPPNSLSTSESIKQGVKYFSELLASSERLSVDLESVI 

i ' I ' . I i I I Ml ' . I I I I . MM : : - I . ' ! I I I I 'Ml: 

KPMWKYAKEYGVEEYVNILIAIIQVESGGTAEDVMQSSESLGLPPNSLSTEESI^ 
15 70 80 90 100 110 120 130 

729 759 789 819 849 879 909 939 

QSYI^GGGFLGYVANRGNKYTFEIAQSFSKEYSGGEK^SYPNPIAIPINGGWRYNYG>MFWQLVTQYLVTTEFDDDrVQ 

lllllllllllllllllMMIIIIIIIIIllllNltllllllllllllllllllllllMIIIIIIIIIMlllllll 

20 QSYOTGGGFLGYVANRGNKYTFEIAQSFSKEYSGGEKVSYPNPIAIPINGGWRYNYGM^FWQIiVTQYLVTTEFDDDTVQ 
150 160 170 180 190 200 210 

969 999 1029 1059 1089 1119 1149 1179 

AIMDEALKTEGWRYVYGGASPTTSFDCSGLTQOTYGKAGINLPRTAQQQYDVTQHIPLSEAOAGDLVFFHSTYNAGSYIT 

25 IIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIMIIIII11IIIIIIIIIIIIIIIIIIIII 

AIMDFALICYEGITOYVYGGASPTTSFDCSGLTQWTYGKAGINLPRTAQQQYDVTQHIPLSEAOAGDLVFFHSTYNAGSY1T 
230 240 250 260 270 280 290 

1209 1239 1269 1299 1329 1359 1389 1419 

30 HVGIYLGNm^FHAGDPIGYADLTSPYWQQHLVGAGRIKQ*ERKI***NLEKIRIKKtTOYQRKRNLVSIRSlLIKRL*LP 

I1IIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIMIMI 

HVGIYLGNNRMFHAGDPIGYADLTSPYWQQHLVGAGRIKQ 
310 320 330 

35 SEQ ID 8796 (GBS155) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 24 (lane 10; MW 38kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 31 (lane 7; MW 62kDa). 

The GBS155-GST fusion product was purified (Figure 111; see also Figure 198, lane 74) and used to 
immunise mice (lane 1 product; 20ug/mouse). The resulting antiserum was used for Western blot, FACS, 
40 and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1354 

45 A DNA sequence (GBSxl439) was identified in S.agalactiae <SEQ ID 4145> which encodes the amino 
acid sequence <SEQ ID 4146>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.60 Transmembrane 37 - 53 ( 35 - 55) 

50 

Final Results 

bacterial membrane Certainty=0. 4439 (Affirmative) < succ? 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9789> which encodes amino acid sequence <SEQ ID 9790> 
was also identified. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1355 

A DNA sequence (GBSxl440) was identified in S.agalactiae <SEQ ID 4147> which encodes the amino 
acid sequence <SEQ ID 4148>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.00 Transmembrane 391 - 407 ( 391 - 407) 

Final Results 

bacterial membrane Certainty=0 .1001 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9787> which encodes amino acid sequence <SEQ ID 9788> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4149> which encodes the amino acid 
sequence <SEQ ID 4150>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

_ Final Results 

bacterial cytoplasm --- Certainty=0 .2027 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 183/669 (27%) , Positives = 305/669 (45%) , Gaps = 63/669 (9%) 

KIINIGVLAHVDAGKTTLTESLLYNSGAITELC3SVDKGTTRTDNTLLERQRGITIQTGIT 66 
K NIG++AHVDAGKTT TE +LY +G I 4-+G +G ++ D E++RGITI + T 
KTRNIGIMAHVDAGKTTTTERILYYTGKIHKIGETHEGASQMDWMEQEQERGITITSAAT 6 8 





7 


Sbjct: 


9 






Sb j ct : 


69 






Sbjct: 


129 




169 


Sbjct: 


189 




214 


Sbjct: 


249 



FPV GSA N G+ +++ -I 



Query: 255 CGNVFKIEYTKKRQRLAYIRLYSGVLHLRDSVRVSEKEKI KVTEMYTS INGELCK1 310 

FKI RL + R+YSGVL+ V + K K ++ +M+ + E I 
55 Sbjct: 309 AALAFKIMTDPFVGRLTFFRWSGVLNSGSYVMNTSKGKRERIGRILQMHANSRQE 1 365 
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Query: 311 DRAYSGEIVILQN-EFLKLNSVIGDTKL^PQRKKIENPHPLLQTTVEPSKPEQREMLLDA 369 

+ Y+G+I + L D K + IE P P++Q VEP + + + A 

Sbjct: 366 ETVYAGDIAAAVGLKDTTTGDSLTDEKAKVILESIEVPEPVIQLMVEPKSKADQDKMGVA 425 

5 

Query: 370 LLEISDSDPLLRYYVDSTTHEI1LSFLGKVQMEVISALLQEKYHVEIELKEPTVIYME— 427 

L ++++ DP R + T E +++ +G++ ++V+ ++ ++ VE + P V Y E 
Sbjct: 426 LQKLAEEDPTFRVETNWTGETV1AGMGELHLDVLVDRMKREFKVEANVGAPQVSYRETF 485 

10 Query: 428 RPLKNAEYTIHIEVPPNPFWASIGLSVSPLPLGSGKQYESSVSLGYLNQSFQNAVMEGIR 487 

R A + ++4+PGG 4+E+++ G + + F AV +G+ 

Sbjct: 486 RASTQARGFFKRQSGGKGQFGDWIEFTPNEEGKGFEFENAIVGGWPREFIPAVEKGLI 545 

Query: 488 YGCEQG-LYGWMVTDCKICFKYGLYYSPV3TPADFRMLAPIA/LEQVLKKAGTELLEPYLS 546 
15 G L G+ + D K G Y+ S+ F++ A + L++ K A +LEP + 

Sbjct: 546 ESMANGVLAGYPMVDVICAICLYDGSYHDVDSSETAFKIAASLALKEAAKSAQPAILEPMML 605 

Query: 547 FKIYAPQEYLSRAYNDAPKYCANIVDTQLKNNEVILSGEIPARCIQEYRSDLTFFTNGRS 606 
I AP++ L + + NI++P+Y + LTGR 

20 Sbjct: 606 OTITAPEDNLGDVMGHVTARRGRVDGMEAHGNSQIVRAYVPLAEMFGYATVLRSATQGRG 665 

Query: 607 VCLTELKGY 615 

+ Y 
Sbjct: 666 TFMMVFDHY 674 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1356 

A DNA sequence (GBSxl441) was identified in S.agalactiae <SEQ ID 4151> which encodes the amino 
30 acid sequence <SEQ ID 4152>. Analysis of this protein sequence reveals the following: 

, Possible site: 33 

>» Seems to have no N- terminal signal sequence 

Final Results 

35 bacterial cytoplasm --- Certainty=0. 2530 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1357 

A DNA sequence (GBSxl442) was identified in S.agalactiae <SEQ ID 4153> which encodes the amino 
acid sequence <SEQ ID 4154>. Analysis of this protein sequence reveals the following: 

45 Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

50 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1358 

A DNA sequence (GBSxl443) was identified in S.agalactiae <SEQ ID 4155> which encodes the amino 
5 acid sequence <SEQ ID 4156>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 



The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1359 

A DNA sequence (GBSxl444) was identified in S.agalactiae <SEQ ID 4157> which encodes the amino 
20 acid sequence <SEQ ID 4158>. This protein is predicted to be excisionase-related protein. Analysis of this 
protein sequence reveals the following: 
Possible site: 40 

»> Seems to have no N-terminal signal sequence 
25 Final Results 



30 The protein is similar to transposon Tn91 6 from S.faecalis. No corresponding DNA sequence was identified 
in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1360 

35 A DNA sequence (GBSxl445) was identified in S.agalactiae <SEQ ID 4159> which encodes the amino 
acid sequence <SEQ ID 4160>. This protein is predicted to be transposase. Analysis of this protein 
sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N-terminal signal sequence 



Final Results 



10 



bacterial cytoplasm Certainty=0. 1630 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 



bacterial cytoplasm Certainty=0 .4481 (Affirmative) 

bacterial membrane Certainty=0 . 0000 (Not Clear) < 

bacterial outside Certainty=0 . 0000 (Not Clear) < 




40 



Final Results 



bacterial cytoplasm Certainty=0 .4626 (Affirmative) < suco 

bacterial membrane Certa±nty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



45 



The protein is similar the Tnl545 integrase from S. pneumoniae and to SEQ ID 578. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1361 

A DNA sequence (GBSxl446) was identified in S.agalactiae <SEQ ID 4161> which encodes the amino 
acid sequence <SEQ ID 4162>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.72 Transmembrane 18 - 34 ( 13 - 41) 

INTEGRAL Likelihood = -6.10 Transmembrane 58 - 74 ( 55 - 79) 

INTEGRAL Likelihood = -5.04 Transmembrane 97 - 113 ( 90 - 116) 

INTEGRAL Likelihood = -1.81 Transmembrane 78 - 94 ( 78 - 94) 

INTEGRAL Likelihood = -0.85 Transmembrane 145 - 161 ( 145 - 161) 

Pinal Results 

bacterial membrane Certainty=0 . 5288 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) <. suco 

The protein has homology with the following sequences in the GENPEPT database. 

. [Escherichia coli K12] 
%) , Gaps = 9/174 (5%) 

Query: 24 LIATLVLWYLYKL GILNDSNELKDLVHKYEFWGPMI FIVAQIVQIVFPVT PGG 77 

L A L+ + +Y + +L D L+ L+ + F+G ++I+ 1+ + ++PG 

25 Sbjct: 24 LFACLI FALVIYAIHAFGLFDLLTDLPHLQTLIRQSGFFGYSLYILLFI IATLL - LLPGS 82 

Query: 78 VTTVAGFLIFGPTLGFIYNYIGIIIGSVILFWLVKFYGRKFVLLFM-DQKTFDKYESKLE 136 

+ +AG ++FGP LG + + I +S FL++ GR +L ++ TF E + 
Sbjct: 83 ILVIAGGIVFGPLLGTLLSLIAATLASSCSFLLARWLGRDLLLKYVGHSNTFQAIEKGIA 142 

30 

Query: 137 TSGYEKFFIFCMASPISEADIMVMITGLSNMSIKRFVTIIMITKPISIIGYSYL 190 

+G + F I P+ P +1 GL+ ++ + I +T 1+ Y+ + 

Sbjct: 143 RNGID-FLILTRLIPLFPYNIQNYAYGLTTIAFyJPYTLISALTTLPGIVIYTVM 195 

35 A related DNA sequence was identified in S.pyogenes <SEQ ID 4163> which encodes the amino acid 

sequence <SEQ ID 4164>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -4.30 Transmembrane 8 - 24 ( 6-29) 
40 INTEGRAL Likelihood = -0.80 Transmembrane 57 - 73 ( 57 - 73) 

INTEGRAL Likelihood = -0.00 Transmembrane 86 - 102 ( 86 - 102) 



Final Results 

bacterial membrane Certainty=0 .2720 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 85/114 (74%) , Positives - 101/114 (88%) 

Query: 89 PTLGFIYNYIGI I IGSVILFWLVKFYGRKFVLLFMDQKTFDKYESKLETSGYEKFFIFCM 148 

P GFIYNY+GIIIGS+ LF LVK YGRKF+LLF++ KTF KYE +LET GYEK FIFCM 
Sbjct: 3 PVTGFIYNYVGIIIGSIALFLLV1CTYGRKPILLFVNDKTFYKYERRLETPGYEKLFIFCM 62 

Query: 149 ASPISPADIMVMITGLSNMSIKRFVTIIMITKPISIIGYSYLWIYGGDILKNFL 202 

ASP+SPADIMVMITGL++MS+KRFVTI++ITKPISIIGYSYL+I+G D++ FL 
Sbjct: 63 ASPVSPADIMVMITGLTDMSLKRFVTILLITKPISIIGYSYLFIFGKDVISWFL 116 
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There is also homology to SEQ ID 1728. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

5 Example 1362 

A DNA sequence (GBSxl447) was identified in S.agalactiae <SEQ ID 4165> which encodes the amino 
acid sequence <SEQ ID 4166>. This protein is predicted to be chlorAMPhenicol acetyltransferase (cat). 
Analysis of this protein sequence reveals the following: 

Possible site: 28 
10 >» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 4725 (Affirmative) < suco 
bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
15 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA86871 GB:U19459 VAT B [Staphylococcus aureus] 
Identities = 57/130 (43%) , Positives = 81/130 (61%) , Gaps = 4/130 (3%) 

20 

Query: 57 IGAFCSIAQIsIVT--ITGLNHPTDHITTNPFIYYKSRGFINEDRADLIDEKKNGKVIIGND 114 

IG FC+IA+ + + G NH 4 ITT PF G+ + h D G ++GND 

Sbjct: 65 IGKFC^IAEGIEFIMNGANHROTSITTYPF-NIMGNGW-EKATPSLEDLPFKGDTVVGND 122 

25 Query: 115 vWIGTNVTILPSVTIGNGAIIGAGSVITKDIPDYAWAGTPAKIIK£RFSEEEITLUJAS 174 

VWIG NVT++P + IG+GAI+ A SV+TKD+P Y ++ G P++IIK RF +E I L 
Sbjct: 123 WIGQWTVMPGIQIGDGAIVAANSVVTKDVPPYRIIGGNPSRIIKKRFEDELIDYLLQI 182 

Query: 175 QWWNWSDEAI 184 
30 +WW+WS + I 

Sbjct: 183 KWWDW3AQKI 192 

There is also homology to SEQ ID 1944. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 1363 

A DNA sequence (GBSxl448) was identified in S.agalactiae <SEQ ID 4167> which encodes the amino 
acid sequence <SEQ ID 4168>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
40 »> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2398 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
45 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-1490- 



Example 1364 

A DNA sequence (GBSxl449) was identified in S.agalactiae <SEQ ID 4169> which encodes the amino 
acid sequence <SEQ ID 4170>. This protein is predicted to be cation-transporting P-ATPase PacL. 
Analysis of this protein sequence reveals the following: 



Possible site: 34 

■»> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -9.18 Transmembrane 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = -5.95 Transmembrane 
INTEGRAL Likelihood = 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = -3 
Likelihood = -C 
Likelihood = -C 



873 - 889 
257 - 273 



Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 282 - 298 

Transmembrane 90 - 10 S 

Transmembrane 737 - 753 

Transmembrane 898 - 914 



866 - 894 
251 - 276! 



736 - 753 
898 - 914 



■ Certainty=0 .4673 (Affirmative) < succ: 
- Certainty=0. 0000 (Not Clear) < suco 

■ Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 10963> which encodes amino acid sequence <SEQ ID 
10964> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



3:AE000912 cation-transporting P-ATPase PacL 
[Methanothermobacter thermoautotrophicus ] 
Identities = 409/922 (44%) ,- Positives = 609/922 (65%), Gaps = 



22/922 (2%) 





10 


Sbjct: 


4 


Query: 


70 


Sbjct: 


64 




130 


Sbjct: 


124 


Query: 


190 


Sbjct: 


184 




250 


Sbjct: 


243 




310 


Sbjct: 


303 




370 


Sbjct: 


363 




430 


Sbjct: 


418 




490 


Sbjct: 


474 



E I + +V GDI+++EEGD + AD R4-44-4- +L+V+ SALTGES P+ K E 



1+ +N++FAGT V+SG+ 



V A G T+E +IA LTQ 4 



■ I +++ +G+I FL 



h V+ P+ +FIFA+G++VA +PEGLLP+VTLSLA 4 



4-4-MA+E4-ALVK4-LSSVETLG4T4-4- 1 C+-DKTGTLT+ EMTV 4-W K 4-VTG GY PE 



4-4-VLGD TE L V 



h L+R A C++4- 4 



EKSGINIQETOKFAPRLKELPFDSTOKRMTTIHSLGGDEKDKKI S ITKGAPKEILDLSDY 4 
EK G 4- 4- K PR4- ELPFDS RK MT4-IH G K4-4-4- . KGAPK+H- LS4- 

-KRVAYVKGAPKKI IGLSER 4 



DG+V L+ +E+ 4-1 +D A GLRVLA 4-Y + 
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Query: 550 GLIAMSDPPREGWEAIDKCHAASIRIIMVTGDYGIjTA13IAKNIGIIROT)DAKVISGLE 609 

G+ AM DPPREGV+EA++ C A IRIIM+TGDYGLTA +IA+ IGI+ + ++I G E 
Sbjct: 533 GMAAMHDPPREGVKEAVEHCKTAGIRIIMITGDYGLTAEAIAREIGIVEG-ECRIIKGKE 591 

Query: 610 LSEMTDSQLKKELSGE--WFARVAPEQICYRWTILQEMGEVVAVTGDGVNDAPALKKSD 667 

L ++ D++L4 L+ E ++FAR PE K R+ ++L++ E+VA+TGDGVNDAPAL+K+D 
Sbjct: 592 LDKLKDTELRGILARERNLIFARAVPEHI01RIAS\TjEDSDEIVAMTGDGVNDAPALRI<AD 651 

Query: 668 IGVAMGVTGTDVAKESADMILTDDHFASIVHRVEEGRAVYQMIKKFLTYIFHSNTPEAVP 727 

IGVAMG +GTDVAKE+AD++L DD+FASIV AV EGR VY+NI+KF+TYIF+ T E VP 
Sbjct: 652 IGVAMG-SGTOVAKEAADIVLADDNFASIVTAVREGRWYENIRKFITYIFSHETAEIVP 710 

Query: 728 SAFFLFSKGF1PLPLTVMQILAVDLGTDMLPALGLGVEPPETDVMNRPPRRLTDRLLDKG 787 

F + IPLP+T+MQILA+DLGTD IiPAL LG PE+DVM PPR ++RLL++ 

Sbjct: 711 - -FIMMVLFSIPLPITIMQIIAIDLGTDTLPALALGRSLPESDWIKLPPRAPSERLIjNRE 768 

Query: 788 LL I KSFLWYGT I ES VLAMGGFFWAHYLRYGNF TFFVANGIPYREATTMTLGAI I FSQ 844 

++++ +L+ GTIE+ L M +F Y G + A+ Y ATT+ 1+ +Q 

Sbjct: 769 VILRGYLFTGTIFAALIMAAYFLVLY--SGGWLPGQELSASDPLYMRATTWFAGIVMAQ 826 

Query: 845 IGMVMNSRTSYQSIKALSIFGNKLlNFGIIMEIIAFLVLVWPLFHm.FMTASLGLSHWL 904 

+G +++S+T S + N+ I G++ I L+++Y+P +F TA G+ W 

Sbjct: 827 LGNLLSSQTLRSSALEAGLLRNRWILAGMVFAISVMLLVIYLPPLQPIFGTAPPGILEWF 886 

Query: 905 YLISCPFIMIGLDEVRKLFSSR 926 

LI 1+ DE+RK R 
Sbjct: 887 ILILFTPIVFLTDEMRKFIQRR 908 

There is also homology to SEQ ID 4172. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



A DNA sequence (GBSxl450) was identified in S.agalactiae <SEQ ID 4173> which encodes the amino 
acid sequence <SEQ ID 4174>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 3740 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) suco 

bacterial outside Certainty=0. 0000 (Not clear) < suco 

The protein has homology with the following sequences in the GENPEPT 



Query: 2 KETQEELRQRIGHTAYQVTQNSATEHAFTGKYDDFFEEGIYVDIVSGEVLFSSLDKFQSG 61 

K T+EEL+Q + Y VTQ +ATE F+G+YDDF+++GIYVDIVSGE LFSSLDK+ +G 
Sbjct: 3 KPTEEELKQTLTDLQYAVTQENATERPFSGEYDDFYQDGIYVDIVSGEPLFSSLDKYDAG 62 

Query: 62 CGWPAFSKPIFJJRMVTNHQDHSKGMHRIEVRSRQADSHLGHVFNDGPVDAGGLRYCINSA 121 

CGWP+F4KPIE R V D SHGMHR+EVRS4+ADSHLGHVF DGP+ GGLRYCIN+A 
Sbjct: 63 CGWPSFTKPIEKRGVKEKADFSHGMHRVFjWSQEADSHIjGHVFTDGPLQEGGLRYCINAA 122 

Query: 122 ALDFIPYDQMAK 133 

AL F+P + K 
Sbjct: 123 ALRFVPVADLEK 134 

A related DNA sequence was identified in S. pyogenes <SEQ ID 4175> which encodes the amino acid 
sequence <SEQ ID 4176>. Analysis of this protein sequence reveals the following: 
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3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3692 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 109/142 (76%) , Positives = 126/142 (87%) 

Query: 3 ETQEELRQRIGHTAYQVTQNSATEHAFTGXYDDFFEEGIYVDIVSGEVLFSSLDKFQSGC 62 
ET +EL+QRIG +Y+VTQ++ATE FTG+YD+FFE+GIYVDIVSGEVLFSSLDKF SGC 

Sbjct: : 



Query: 123 LDFIPYDQMAKRGYGDYLSLFD 144 

L FIFYDQM K GY +L+LFD 
Sbjct: 122 LKFIPYDQMEKEGYAQWLTLFD 143 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1366 

A DNA sequence (GBSxl451) was identified in S.agalactiae <SEQ ID 4177> which encodes the amino 
acid sequence <SEQ ID 4178>. Analysis of this protein sequence reveals the following: 

10 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1674 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MIRRAKEKDLPDrAELLKQILMLHHEVRPDIFETRGSKFSKKQLKEMLIDESKPIFVYES 60 

+IR A +D ++A L Q+ H + R DIF + + + + E + V+ 

Sbjct: 2 IIREATVQDYEEVARLHTQVHEAHVKERGCIFRSNEPTLNPSFFQAAVQGEKSTVLVFVD 61 

Query: 61 DEGKWAHLFLQLQEKPJ3LPR-KSFKTLYIDDLCIDEEVRGQQ1GQKLMDFARQYAKKHG 119 

+ K+ A+ + L + LP + KT+YI DLC+DE RG IG+ ' + + Y K H 
Sbjct: 62 EREKIGAYSVIHLVQTPLIjPTMQQRKTVYISDLCVDETRRGGGIGRLIFEAIISYGKAHQ 121 

Query: 120 CYNITLNvWNDNQRAVSFYEKLGFKPQQTQME 151 

I L+V++ N RA +FY LG + Q+ ME 
Sbjct: 122 VDAIELDVYDFNDRAKAFYHSLGMRCQKQTME 153 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1367 

A DNA sequence (GBSxl452) was identified in S.agalactiae <SEQ ID 4179> which encodes the amino 
acid sequence <SEQ ID 4180>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



Final Results 

bacterial cytoplasm Certain ty=0 . 3285 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9785> which encodes amino acid sequence <SEQ ID 9786> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06554 GB:AP001516 unknown conserved protein [Bacillus halodurans] 
Identities = 108/211 (51%) , Positives = 149/211 (70%) 





7 


Sbjct: 


3 




67 


Sb j ct : 


63 


Query: 


127 


Sbjct: 


123 




187 


Sbjct: 


183 



DK+ D + A +L W+ + + +++H +DI+ ISFK G G ++ T E +VQDAD 



RLDA+GAIGIART AYSG+KG+ I-DP L RE +TtEEYR+G+ TAI HFYEKL KLKD 



] ,MNT+ GK LA++RH I 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1368 

A DNA sequence (GBSxl453) was identified in S.agalactiae <SEQ ID 4181> which encodes the amino 
acid sequence <SEQ ID 4182>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> May be a lipoprotein 



Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

!GB:U25448 internalin [Listeria monocytogenes] 

!GB:U25448 internalin [Listeria monocytogenes] 

!GB:U25448 internalin [Listeria monocytogenes] 

!GB:U25448 internalin [Listeria monocytogenes] 



>GP:AAA69530 GB:U25448 internalin [Listeria monocytogenes] 
55 Identities = 78/253 (30%), Positives = 132/253 (51%), Gaps = 2/253 (0%) 
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Query: 531 LKQLWMTNTGITDYSFLDKMPLLEC-LDISQNGIKDLSFLTKYKQLSLIAAA131IGITSLKP 590 

L Q+ +N +TD + L + L + ++ N I D++ L L+ + UN IT + P 

Sbjct: 26 LTQINFSNNQLTDITPLKDLTKLVDILMNMJQIlffilTPIiMILSNLTGLTLFlWQITDIDP 85 

Query: 591 LAELPNLQFLVLSHNNISDLTPLSNLTKLQELYLDHNNVKNLSALSGKKDLK^DLSNNK 650 

L L NIi LLSB ISD++ LS LT LQ+L h N V +h L+ L+ LD+S+NK 
Sbjct: 86 LKNLTNIiNRLELSSNTISDISALSGLTSLQQLSLG-NQVTDLKPLftNLTTLERLDISSNK 144 

Query: 651 SADLSTD-KTTSLETLLIMTMTSNLSFLKQNPKVSNIjTINNAKLASLDGIEESDEIVKV 709 

+D+S h K T+LE+L+ S+++ L + L++N +L + + + + 

Sbjct: 145 VSDISVLAKLTNLESLIATNNQISDITPLGILTNLDELSLNGNQLKDIGTLASLTNLTDL 204 



Sbjct: 205 DLMMQISNLAPLPGLTKLTELKLGANQISNIXPLAGLTALTNLELNENQLEDISPISNL 264 

Query: 770 KTVTNLDFSHNNV 782 

K +T L NN+ 
Sbjct: 265 KNLTYLTLYFNNI 277 
Identities = 91/300 (30%) , Positives = 141/300 (46%) , Gaps = 42/300 (14%) 

Query: 519 im)MTPVLQFKKLKQLWMTNTGITDYSFLDKI«IPLLEGLDISQNGIKD---LSFLTKYKQL 575 

I D+TP+ L L + N ITD L + h L++S N I D LS LT +QL 

Sbjct: 58 IMDITPLANLSNLTGLTLFNNQITDIDPLro&TOLmLELSSNTISDISALSGLTSLQQL 117 

Query: 576 SLIAAANNGITSLKPLA ELPNLQFLVLSHNNISDLTPL 613 

Sbjct: 118 SL GNQOTDLKPLANLTTLERLDISSNKVSD1SVLAKLTNLESLIATNNQISDITPL 173 

Query: 614 SITCjTKLQELYIBHHOTKNLS^ 672 

LT L EL L+ N +K++ L+ +L LDL+NN+ ++L+ L T L L L 
Sbjct: 174 GILTNLDELSI^GNQLI<DIGTIiASLTl&TDLDIjANNQIS]SrLAPLPGLTKLTELKLGANQI 233 

Query: 673 SNLSFLKQNPKVSNLTINNAKLASLDGIEESDEIVKVEAEGNQIKSLVLKNKQGSLKFLN 732 

SN+ L ++NL +N+L + I + + NI++ L+L 

Sbjct: 234 SNIXPLftGLTALTNLELNENQLEDISPISNLKNLTYLTLYFNNISDISPVSSLTKLQRLF 293 

Query: 733 VTIOTQLTSLEGVMJYTSLETLSVSKNKLESLDIKTPNKTV^ 792 
NN+++ + + N T++ LS M++ L TP +T + +QL LN++ 

Sbjct: 294 FYNNKVSDVSSLAWLTWINWLSAGHNQISDL---TPLANLTRI TQLGLNEQ 341 

Identities = 73/253 (28%) , Positives = 124/253 (48%) , Gaps = 4/253 (1%) 

Query: 540 GITDYSFLDKMPLLEGLDISQNGIKDLSFLTKYKQLSLIAAANNGITSLKPLAELPNLQF 599 

GI L+ 4 L ++ S N + D++ L +L I NN I + PI* L NL 

Sbjct: 13 GIKSIDGLEYUm,TQINFSI^QLTDITPLKDLTKLVT>ILMNmQIADITPLANLSNLTG 72 

Query: 600 LVLSHNNISDLTPLSNLTKLQELYLDHHTTVTO^SALSGKKDLKVLDLSNWKSADLSTLKT 659 

L L +N I+D+ PL NLT L L L N + ++SALSG L+ L L N + 
Sbjct: 73 LTLFNNQITDIDPLKNLTNLNRLELSSjNTISDISALSGLTSLQQLSLGNQVTDLKPLANL 132 

Query: 660 TSLETLLI^ffiTNTSI^SFLKQNPKATSNLTIl^AKIASLDGIEESDEIVKVEAEGNQIKSL 719 

T+LE L ++ S++S L + + +L N +++ + + + ++ GNQ+K + 

Sbjct: 133 TTLERLDISSNKVSDISVIJAKLTNLESLIAraNQISDITPLGILTNLDELSLNGNQLKDI 192 

Query: 720 VX 1 KNKC^SLKFI,NVTNNQLTSLEGV]>nOTSLETLSVSKNKLESLDIKTPNKTVTNLDFSH 779 

+L L++ NNQ+++L + T L L + N++ ++ +TNL+ + 

Sbjct: 193 GTLASLT2^TDLDIANNQISISn^PLPGLTK^^ 252 

Query: 780 NNV PTSQLK 788 

N + P S LK 
Sbjct: 253 NQLEDISPISNLK 265 
Identities = 56/209 (26%) , Positives = 115/209 (54%) , Gaps = 2/209 (0%) 

Query: 575 LSLIAAANWGITSLKPIiAELPNLQFLVliSHNNlSDLTPLSNLTKLQELYLDHNNV^NLSA 634 

++ + A GI St L L NL + S+N ++D+TPL +LTKL ++ +++N + +++ 
Sbjct: 4 WTLQADRLGIKSIDGLEYLNNLTQINFSNNQLTDITPLKDLTKLVDILMNNNQIADITP 63 
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Query: 635 LSGKKDIiKVLDLSNNKSADLSTLKT-TSLETLLLNETl'n?SNLSFLKQNPKVSNLTINNAK 693 

L+ +L L L NN+ D+ LK T+L L L+ S++S L + L++ N + 

Sbjct: 64 LANLSNLTGLTLF1MQITDIDPLKNLTNLNRLELSSNTISDISALSGLTSLQQLSLGN-Q 122 

Query: 694 l^SLDGIEESDEIVKVEAEGNQIKSLVLKircQGSLKFL^^ 753 

+ L + + +++ N++ + + K +L+ L TNNQ++ + + T+L+ L 

Sbjct: 123 VTDLKPLANLTTLERLDISSNICVSDISVIAKLTNLSSLIATN1IQISDITPLGILTNLDEL 182 

Query: 754 SVSKNKLESLDIKTPNKTVTNLDFSHNNV 782 

S++ N+L+ + +T+LD ++N + 

Sbjct: 183 SLNGNQLKDIGTIASLTNLTDLDLANNQI 211 
Identities = 61/228 (26%) , Positives = 118/228 (51%) , Gaps = 3/228 (1%) 



L+ L TN 1+ 





483 


Sbjct: 


111 




543 


Sbjct: 


169 


Query: 


603 


Sbjct: 


229 




662 


Sbjct: 


289 


Identities 




369 


Sbjct: 


77 


Query: 




Sbjct: 


124 




487 


Sbjct: 


177 


Query: 


545 


Sbjct: 


237 






Sbjct: 


297 



ANN I++L PL L L L L 



N IS++ PL+ LT L' L L+ N ++++S +S K+L L L N +D+S + 



L+ L S++S L ++ L+ + +++ L + I ++ 

LQRLFFYNNKVSDVSSLANLTNINWLSAGHl^QlSDLTPIiANLTRITQL 336 
! = 60/286 (20%), Positives = 129/286 (44%), Gaps = 24/286 (8%) 



G L+ + N+ L 4 



L L++++N ++D+S 1 



A related DNA sequence was identified in S. pyogenes <SEQ ID 4183> which encodes the amino e 
sequence <SEQ ID 4184>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
>>> May be a lipoprotein 



Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAA69530 GB:U25448 internalin [Listeria monocytogenes] 
Identities = 88/279 (31%), Positives = 149/279 (52%), Gaps = 2/279 (0%) 
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Query: 419 LPNLETLGIC3FTPIKDISPVLQFKKLKQLLMTKTGVTDYRFLDNMPQLEGIDISQHNLKD 478 

L + TL IK I + L Q+ + +TD L ++ +L I ++ N + D 

Sbjct: 1 LDXVTTLQADRLGIKSIDGLEYLITOL^ 60 

Query: 479 ISFLSKYKHLTLVAAADNGIEDIRPLGQLPNLKFLVLSHMKISDLSPLASLHQLQELHID 538 

1+ L+ NLT + +N I DI PL L Nh L LS+N ISD+S L+ h LQ+L + 
Sbjct: 61 ITPLAIttSNLTGLTLFNNQITDIDPLICNIjrNLNRLELSSNTISDISALEGLTSLQQLSL- 119 

Query: 539 mQITDLSPVSHKESLTVVDLSRNADVDLATL-QAPKLETL^WOT3TKVSHLDFLKWNPI^ 597 

NQ4-TDL P4-4+ +L 4-D4-S N D4-t- L + LE+L+ 4- ++S + L NL 
Sbjct: 120 GNQVTDLKPLffiNLTTLERLDISSNICVSDISVIAKLTNLESLIAraNQISDITPLGILTNL 179 

Query: 598 SSLSIOT^QLQSLEGIEASSVITOVJ^GNQIKSLVLKDKQGSLTFLDVTGKQLTSLEGV 657 

LS+N QL+ + + + + + ++ NQI +L LT L + NQ++++ + 

Sbjct: 180 DELSIjNGNQLKDIGTIASLTNLTDLDIAmQISNLAPLPGLTKLTELKLGANQISNIXPL 239 

Query: 658 NNFTALDILSVSKNQLTNVNLSKPNKT , TOIIDISKNNIS 695 

TAL L +++NQL +++ K +T + + NNIS 

Sbjct: 240 AGLTALTMLELNENQLED I SPI SNLKNLTYLTIYFNNI S 278 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 346/753 (45%), Positives = 472/753 (61%), Gaps = 63/753 (8%) 

Query: 187 SRLGNQSNSHYRVNSSK IAGLHYPTSNGFLFNGRG- IKGTTPTGILVEHHNH 237 

SR G SN + SK +AG+ +PT +GF+ I T GI+V+H H 

Sbjct: 38 SRKGMTSNKIKPIKKSKKTNKTHKGVAGVDFPTDDGFILTKDSKILSKTDQGIWDHDGH 97 

Query: 238 LHFISFADLRKGGW GS IADRYQPQKKADSKKQS PS SKKPRTENTLPKD I - - KDK 289 

HFI +ADL+ + G+ + ++A S+ S + P DI +D 

Sbjct: 98 SHFIFYADLKGSPFEYLIPKGASLAKPAVAQRaASQGTSKVADPHHHYEFNPADIVREnA 157 

Query: 290 LAYLARE LHIiDI SRIRVLKTLNGEIGFEYPHDDHT 324 

LYR H+ S+TNGG+PD 

Sbjct: 158 LGYTVRHDDHFHYILKSSLSGQTQAQAKQVATRLPQTSSLVSTATANGIPGLHFPTSDGF 217 

Query: 325 HVIMAKDIDLSKPIPNPHHDDEDH HKGHHHD- - -ESDHKHEEHEHTK 368 

+ +4K HD H H +D +++ E H+ + 

Sbjct: 218 QTOGQGIVGOTKDSILvDHDGHLHPISFADLRQGGWAHVADQYDPAKKAEKPAETHQTPE 277 

Query: 369 SNKLSDEDQKKLIYLAEECLGLNPNQIEVLTSEDGSIIFKYPHDDHSHTIASKDIEIGKPI 428 

++ E Q+KL YLAEKLG++P+ 1+ + ++DG + +YPH DH+H + DIEIGK I 
Sbjct: 278 LSEREKEYQEKLAYIAEtOLGIDPSTIKRVETQDGKLGLEYPHHDHAHVLMLSpiEIGKDI 337 

Query: 429 PDGH HDHSHAKDKVGMATLKQIGFDDEIIQDILHA-DAPTPFPSNETNPEKMRQWLA 484 

PD H H K KVGM TL+ +GFD+E+I DI+ DAPTPFPSNE +P M++WLA 

Sbjct: 338 PDPHAIEHARELEKHKVGMDTLRALGFDEEVILDIWTHDAPTPFPSI^IEKDPNMMKEWLA 397 

Query: 485 TVTKINIGQRTWPFQRFGLSLMPNIEVLGIGFTPIlTOMTPVLQFKICLKQLlflMTOTGITDY 544 

TV K+++G R +P QR GLSL+PN+E LGIGFTPI D++PVLQFKKLKQL MT TG+TDY 
Sbjct: 398 WIKLDLGSRKDPLQRKGLSLLPNLETLGIGFTPIKDISPVLQFraO<KQLLMTKTGVTDY 457 

Query: 545 SFLDKMPLLEGLDISQNGIKDLSFLTKYKQLSLIAAAHNGITSLKPLAELPNLQFLVLSH 604 

FLD MP LEG+DISQN +KD+SFL+KYK L+L+AAA+NGI ++PL +LPNL+FLVLS+ 
Sbjct: 458 RFmNMPQLEGIDISQNNLKDISFLSKYKRLTLVAAADNGIEDIRPLGQLPNLKFLVLSN 517 

Query: 605 NNISDLTPLSI^TKLQELYLDHIMVKNLSALSGKKDLKVTiDL 664 

N ISDL+PL++L +LQEL++D+N + +LS +S K+ L V+DLS N DL+TL+ LET 
Sbjct: 518 NKISDLSPIASLHQLQELHIDHNQITDLSPVSHKESLTVVTDLSRNADVDLATLQAPKLET 577 

Query: 665 LL1LNETMTSHLSFLKQNPKVSNLTINNAKIASLDGIEESDEIVKVEAEGNQIKSLVLKNK 724 

L++N+T S+L FLK NP +S+L+IN A+L SL+GIE S IV+VEAEGNQIKSLVLK+K 
Sbjct: 578 LMVlTOTKVSHLDFLK^PNLSSLSINRAQI^SLEGIEJ^SVIWVFJffiGKQIKSLvIjKDK 637 

Query: 725 QGSLKFUWTOTQLTSLEGvNNYTSLETLSVSKNKLESLDIKTPN^ 784 

QGSL FL+VT NQLTSLEGVNN+T+L+ LSVSKN+L ++++ PNKTVTN+D SHNH+ 
Sbjct: 638 Q^SLTFLDVTGNQLTSLEGVMJFTALDILSVSKIIQLTNVNLSKPNKTVTNIDISHNNISL 697 
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Query: 844 1 

+ E+ D H+HE+ + A +H 

Sbjct: 758 YEDEEGHftHBHRDKDDHDHEHEDENEAKDEQNH 790 

SEQ ID 4182 (GBS84) was expressed in E.coli as a ffis-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 16 (lane 9; MW 97.6kDa). 

GBS84-His was purified as shown in Figure 194, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1369 

A DNA sequence (GBSxl454) was identified in S.agalactiae <SEQ ID 4185> which encodes the amino 
acid sequence <SEQ ID 4186>. This protein is predicted to be GTP-binding protein lepa (lepA). Analysis of 
this protein sequence reveals the following: 



----- Final Results 

bacterial cytoplasm --- Certainty=0. 1962 (Affirmative) < succ; 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 8 KKQEKIRNFSIIAHIDHGKSTI^RILEKTETVSSREMQAQLLDSNDLERERGITIKLNA 67 

+RQ + IRNFSIIAHIDHGKSTLADRILEKT +4 REM+ QLLDSMDLERERGITIKLN+ 
Sbjct: 9 ERQSRIRNFSIIAHIDHGKSTLADRILEKTSAITQREMKEQLLDSMDLERERGITIKLNS 68 

Query: 68 IEUTCTAKDGETYIFHLIDTPGHVDFTYEVSRSLAACEGAI^ 127 

++L Y AKDGE YIFHLIDTPGHVDFTYFA?SRSLAACEGAILWD7a7aQGIERQTLHNVYL 
Sbjct: 69 VQLKYKAKDGEEYIFHLIDTPGHVDFTYEVSRSI^CEGAILVVDAAQGIEAQTIANVYL 128 

Query: 128 ALDNDLEILPVINKIDLPAADPERVRAEVEDVIGLDASEAVLaSRKAGIGIEEILEQIVE 187 

ALDNDLEILPVINKIDLP+A+PERVR EVEDVIGLDASEAVLASAKAGIGIEEILEQIVE 
Sbjct: 129 ALDNDLEILPVINKIDLPSAEPERVRQEVEDVIGLDASEAVLASAKAGIGIEEILEQIVE 188 

Query: 188 KVPAPTGEVDAPLQALIFDSVYDAYRGVII^TOIVNGMVKPGDKIQMMSNGKTFDVTEVG 247 

KVPAPTG+ +APL+ALIFDS+YDAYRGV+ +R+V G VKPG KI+MM+ GK F+VTEVG 
Sbjct: 189 K^PAPTGDPFAPLKALIFDSLYDAYRGWAYIRVVEGTVKPGQKIKMMATGKEFEVTEVG 248 

Query: 248 IFTPKAVGRDFIATGDVGYIAASIKOTAl^VGin'ITIANNPAIEPLHGYKQMNPMVFAG 307 

+FTPKA + L GDVG++ ASIK V DTRVGDTIT A NPA E L GY+++NPMV+ G 
Sbjct: 249 VFTPKATPI^LWGDVGFLTASIKWGOTRVGDTITSAANPAEFJUjPGYRKLNPMVYCG 308 

Query: 308 LYPIESNKYNDLREALEKLQIiNDASLQFEPETSQALGFGFRCGFLGIjLHMDVIQERLERE 367 

LYPI++ KYNDLREALEKL4IJ!1D+SLQ4-E ETSQALGFGFRCGFLG+LHM++IQER+ERE 
Sbjct: 309 LYPIDTAKY^LREALEKLELNDSSLQYEAETSQALGFGFRCGFLGMLHMEIIQERIERE 368 

Query: 368 FWIDLIMTAPSWYHVNTTDGEMLEVSNPSEFPDPTRVDSIEEPYVKAQI>WPQEFVGAV 427 

FNIDLI TAPSV+Y V TDGE + V NPS PDP +++ +EEPYVKA +MVP ++VGAV 
Sbjct: 369 FNIDLITTAPSVIYDVYMTDGEKV^/VDNPSNMPDPQKIERVEEPYVKATMMVPNDYVGAV 428 

Query: 428 MEIAQRKRGDFVTMDYIDDNRWVIYQIPLAEIVFDFFDKLKSSTRGYASFDYEISEYRR 487 
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MEL Q KRG+F+ M Y+D NRV++IY +PLAEIV++FFD+LKSST+GYASFDYE+ Y+ 
Sbjct: 429 MELCQGKRGNFIDMQYLDANRVSIIYDMPIAEIVYEFFDQLKSSTKGYASFDYELIGYKP 488 

Query: 488 SQMKMDILMGDKVDALSFIVHKEFAYBRGKLIVDKLKKIIPRQQFEVPIQAAIGQKIV 547 

S+L KMDI +IMG+K+DALSFIVH+++AYERGK+IV+KLK++ 1 PRQQFEVP+QAAIGQKIV 
Sbjct: 489 SKBVKMDIMLMGEKIDALSFIVHRDYAYERGIOTIVEKLKELIPRQQFEVPVQAAIGQKIV 548 

Query: 548 ARSDIKALRKOTIAKCYGGDVSRKRKLLEKQKAGKKRMKAIGSVEVPQEAFLSVLSMDDD 607 

ARS IKA+RKNVLAKCYGGD+SRKRKLLEKQK GK+RMK +GSVEVPQEAF++VL MDD 
Sbjct: 549 ARSTIKAMRKNVLAKCYGGDISRKRKLLEKQKEGKRRMKQVGSVEVPQEAFMAVLKMDDS 608 

Query: 608 DKK 610 
KK 

Sbjct: 609 PKK 611 

A related GBS sequence was identified <SEQ ID 10775> which encodes the amino acid sequence <SEQ ID 
10776>. A further related GBS nucleic acid sequence <SEQ ID 10955> which encodes amino acid 
sequence <SEQ ID 10956> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4187> which encodes the amino acid 
sequence <SEQ ID 4188>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
»> Seems to have no N-terminal signal sequence 



25 , Final Results 

bacterial cytoplasm --- Certainty=0. 1829 (Affirmative) < suco 
bacterial membrane -— Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the databases: 

>GP:CAB14493 GB:Z99117 GTP-binding protein [Bacillus subtilis] 
Identities = 463/603 (76%), Positives = 542/603 (89%) 

Query: 8 KRQEKIRNFSIIAHIDHGKSTLADRILEKTETVSSREMQAQLLDSMDLERERGITIKLNA 67 
35 +rq + IRNFSI IAHIDHGKSTLADRILEKT ++ REM+ QLLDSMDLERERGITIKLN+ 

Sbjct.: 9 ERQSRIRNFSIIAHIDHGKSTLADRILEKTSAITQREMKEQLLDSMDLERERGITIKLNS 68 

Query: 68 IELNYTAKDGETYIFHLIDTPGHvDFTYEVSRSLAACEGAILVVDAAQGIEAQTLANVYL 127 
++L Y AKDGE YIFHLIDTPGHVDFTYEVSRSLAACEGAILVVDAAQGIEAQTLANVYL 
40 Sbjct: 69 VQLKYKAKDGEEYIFHLIDTPGHVDFTYEVSRSLAACEGAILVVDAACGIEAQTIiANVYL 128 

' Query: 128 ALDNDLEILPVINKIDLPAADPERVRHEVEDVIGLDASEAVLASAKAGIGIEEILEQIVE 187 
ALDNDLEILPVINKIDLP+A+PERVR EVEDVIGLDASEAVLASAKAGIGIEEILEQIVE 
Sbjct: 129 ALDNDLEILPVIKKIDLPSAEPERVRQEVEDVIGLDASEAVLASAKAGIGIEEILEQIVE 188 

45 

Query: 188 KVPAPTGDVDAPLQALIFDSVYDAYRGVILQVRIVKGIVKPGDKIQMMSNGKTFDVTEVG 247 

KVPAPTGD +APL+ALI FDS + YDAYRGV+ +R+V G VKPG KI+MM+ GK F+VTEVG 
Sbjct: 189 KVPAPTGDPEAPIiKALIFDSIiYDAYRGWAYIRVTOGTVTCPGQKIKWlATGKEFEVTEVG 248 

50 Query: 248 IFTPKAVGRDFLATGDVGWARSIKOTADTRVGDTVTIiANNPAKEALHGYKQMNPMVFAG 307 

+FTPKA + L GDVG++ ASIK V DTRVGDT+T A NPA+EAL GY+++NPMV+ G 
Sbjct: 249 VFTPKATPTNELTVGDVGFLTASIKNVGDTRVGDTITSAANPAEEALPGYRKLNPMVYCG 308 

Query: 308 IYPIESNKYNDLREALEKLQLNDASLQFEPETSQALGFGFRCGFLGLLHMDVIQERLERE 367 
55 +YPI++ KYNDLREALEKL+UJD+SLQ+E ETSQALGFGFRCGFLG+LHM-)-+IQER+ERE 

Sbjct: 309 LYPIDTAKYNDLREALEKLELNDSSLQYEAETSQALGFGFRCGFLGMLHMEIIQERIERE 368 

Query: 368 FT^IDLIICPAPSVVYHVHTTDEDMIETOOTSEFPDPTR^ZAFIEEPYVKAQI^WPQEFVGAV 427 
FNIDLI TAPSV+Y V+ TD + + V NPS PDP ++ +EEPYVKA +MVP ++VGAV 

^ Sbjct: 369 FNIDLITTAPSVIYDVYMTW3EKV\fin^PS 428 

Query: 428 MELSQRKRGDFVTMDYIDDNRVNVIYQIPLAEIVFDFFDKLKSSTRGYASFDYDMSEYRR 487 



